AfriqueGemma-4B
Model Overview
AfriqueGemma-4B is part of the AfriqueLLM suite—a collection of open language models adapted to 20 African languages through continued pre-training (CPT) on 25.2B tokens. This model is based on google/gemma-3-4b-pt and has been specifically adapted for improved performance on African languages while maintaining strong capabilities in high-resource languages.
Key Features
- Type: Causal Language Model (Base/Pre-trained)
- Base Model: Gemma 3 4B PT
- Parameters: 4B
- Context Length: 8,192 tokens (native)
- Training Tokens: 25.2B tokens of carefully curated multilingual data
Supported Languages
AfriqueGemma-4B has been adapted for the following 20 African languages plus 4 high-resource languages:
| Language | Code | Family | Script |
|---|---|---|---|
| Afrikaans | afr_Latn | Germanic | Latin |
| Swahili | swh_Latn | Bantu | Latin |
| Moroccan Arabic | ary_Arab | Semitic | Arabic |
| Somali | som_Latn | Cushitic | Latin |
| Amharic | amh_Ethi | Semitic | Ethiopic |
| Egyptian Arabic | arz_Arab | Semitic | Arabic |
| Hausa | hau_Latn | Chadic | Latin |
| Kinyarwanda | kin_Latn | Bantu | Latin |
| Zulu | zul_Latn | Bantu | Latin |
| Igbo | ibo_Latn | Volta-Niger | Latin |
| Plateau Malagasy | plt_Latn | Austronesian | Latin |
| Xhosa | xho_Latn | Bantu | Latin |
| Shona | sna_Latn | Bantu | Latin |
| Yoruba | yor_Latn | Volta-Niger | Latin |
| Nyanja | nya_Latn | Bantu | Latin |
| Southern Sotho | sot_Latn | Bantu | Latin |
| Tigrinya | tir_Ethi | Semitic | Ethiopic |
| Tunisian Arabic | aeb_Arab | Semitic | Arabic |
| Oromo | gaz_Latn | Cushitic | Latin |
| Tswana | tsn_Latn | Bantu | Latin |
High-resource languages (for catastrophic forgetting mitigation): English, French, Portuguese, Arabic
Training Data
Our training corpus combines multiple high-quality sources:
- African Monolingual Data (~22.8B tokens): FineWeb2, WURA, and MADLAD-400
- Code (~1B tokens): CornStack-Python for reasoning capabilities
- Mathematics (~1B tokens): FineMath-4+ for mathematical understanding
- Synthetic Data (~324M tokens): GPT-4.1 translated domain-specific content across 10 domains
We use UniMax sampling to create a balanced distribution, capping high-resource languages at approximately 1B tokens and upsampling lower-resource languages for up to five epochs.
Quickstart
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "McGill-NLP/AfriqueGemma-4B"
# Load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# Prepare the model input
prompt = "Bawo ni o á¹£e n á¹£e?" # Yoruba: "How are you doing?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
# Generate text
generated_ids = model.generate(
**inputs,
max_new_tokens=100,
do_sample=True,
temperature=0.7,
top_p=0.9
)
output = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(output)
Deployment
For deployment, you can use vllm or sglang to create an OpenAI-compatible API endpoint:
vLLM:
vllm serve McGill-NLP/AfriqueGemma-4B
SGLang:
python -m sglang.launch_server --model-path McGill-NLP/AfriqueGemma-4B
Training Details
Hyperparameters
- Learning Rate: 5e-5 (with warmup and cosine decay)
- Context Length: 16,384 tokens
- Optimizer: AdamW
- Precision: BF16 mixed precision
Infrastructure
Training was conducted using the LLaMA-Factory framework on 2 nodes of NVIDIA H100x8 GPUs with:
- DeepSpeed ZeRO-1/ZeRO-2
- Flash Attention 3
- Sequence packing
- Liger Kernel optimizations
Evaluation
All AfriqueLLM models are evaluated on multiple multilingual benchmarks:
| Model | AfriMGSM | AfriMMLU | AfriXNLI | Belebele | FLORES | INJONG | SIB-200 | Overall | Δ |
|---|---|---|---|---|---|---|---|---|---|
| Gemma3-4B | 10.24 | 33.89 | 37.76 | 45.79 | 29.50 | 55.52 | 63.59 | 39.47 | |
| AfriqueGemma-4B | 14.86 | 36.73 | 39.62 | 50.52 | 57.31 | 69.28 | 69.21 | 48.22 | +8.7 (22.2%) |
| Gemma3-12B | 25.21 | 48.76 | 44.01 | 68.84 | 40.16 | 73.53 | 79.17 | 54.24 | |
| AfriqueGemma-12B | 32.14 | 49.47 | 44.60 | 68.65 | 66.89 | 76.79 | 75.08 | 59.09 | +4.8 (8.9%) |
| Qwen3-8B | 11.22 | 36.56 | 38.24 | 44.63 | 18.93 | 29.47 | 53.06 | 33.16 | |
| AfriqueQwen-8B | 39.68 | 46.91 | 45.99 | 68.46 | 63.54 | 73.36 | 77.00 | 59.28 | +26.1 (78.8%) |
| Qwen3-14B-Base | 16.60 | 39.66 | 43.22 | 50.74 | 20.86 | 41.80 | 66.29 | 39.88 | |
| AfriqueQwen-14B | 45.01 | 52.22 | 49.01 | 74.63 | 65.26 | 77.80 | 82.63 | 63.79 | +23.9 (60.0%) |
| Llama3.1-8B | 8.14 | 32.27 | 37.90 | 40.95 | 23.59 | 41.37 | 59.99 | 34.89 | |
| AfriqueLlama-8B | 17.51 | 36.57 | 37.39 | 50.51 | 64.88 | 71.17 | 69.14 | 49.60 | +14.7 (42.2%) |
Model Variants
- AfriqueGemma-12B - Larger 12B variant
- AfriqueQwen-8B - Qwen-based 8B model
- AfriqueQwen-14B - Qwen-based 14B model (flagship)
- AfriqueLlama-8B - Llama-based 8B model
License
This model is released under the CC BY 4.0 License. Please review the license terms before use.
Acknowledgments
We thank the creators of the base models, datasets and compute resources that made this work possible, including Mila, Compute Canada, Microsoft, the FineWeb team, WURA, MADLAD-400 and etc..
- Downloads last month
- 18