Zero-Shot Image Classification
Transformers
PyTorch
Safetensors
vision-text-dual-encoder
image generation
visual qa
text-image embedding
image-text embedding
sartify
visual conversional ai
image semantic retrival
african raw resourced languages
Instructions to use sartifyllc/AViLaMa with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sartifyllc/AViLaMa with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("zero-shot-image-classification", model="sartifyllc/AViLaMa") pipe( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png", candidate_labels=["animals", "humans", "landscape"], )# Load model directly from transformers import AutoProcessor, AutoModel processor = AutoProcessor.from_pretrained("sartifyllc/AViLaMa") model = AutoModel.from_pretrained("sartifyllc/AViLaMa") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -30,7 +30,7 @@ library_name: transformers
|
|
| 30 |
---
|
| 31 |
|
| 32 |
# AViLaMa : African Vision-Languages Aligment Pre-Training Model.
|
| 33 |
-
Learning Visual Concepts Directly From African Languages Supervision. [
|
| 34 |
|
| 35 |
## Model Details
|
| 36 |
AViLaMa is the large open-source text-vision alignment pre-training model in African languages. It brings a way to learn visual concepts directly from African languages supervision. Inspired from OpenAI CLIP, but with more modalities like video, audio, etc.. and other techniques like agnostic languages encoding, data filtering network. All for more than 12 African languages, trained on the #AViLaDa-2B datasets of filtered image, video, audio-text pairs. We are also working to make it usable in directly vision-vision tasks.
|
|
|
|
| 30 |
---
|
| 31 |
|
| 32 |
# AViLaMa : African Vision-Languages Aligment Pre-Training Model.
|
| 33 |
+
Learning Visual Concepts Directly From African Languages Supervision. [Paper is coming]()
|
| 34 |
|
| 35 |
## Model Details
|
| 36 |
AViLaMa is the large open-source text-vision alignment pre-training model in African languages. It brings a way to learn visual concepts directly from African languages supervision. Inspired from OpenAI CLIP, but with more modalities like video, audio, etc.. and other techniques like agnostic languages encoding, data filtering network. All for more than 12 African languages, trained on the #AViLaDa-2B datasets of filtered image, video, audio-text pairs. We are also working to make it usable in directly vision-vision tasks.
|