AfriqueQwen-14B
Model Overview
AfriqueQwen-14B is the flagship model of the AfriqueLLM suite—a collection of open language models adapted to 20 African languages through continued pre-training (CPT) on 27.5B tokens. This model is based on Qwen/Qwen3-14B-Base and has been specifically adapted for improved performance on African languages while maintaining strong capabilities in high-resource languages.
Our experiments show that Qwen 3 models achieve the best performance among all base models tested, better preserving performance in high-resource languages after CPT and achieving strong results on long-context tasks such as document-level translation.
Key Features
- Base Model: Qwen 3 14B Base
- Parameters: 14B
- Context Length: 32,768 tokens (native)
- Training Tokens: 27.5B tokens of carefully curated multilingual data
Supported Languages
AfriqueQwen-14B has been adapted for the following 20 African languages plus 4 high-resource languages:
| Language | Code | Family | Script |
|---|---|---|---|
| Afrikaans | afr_Latn | Germanic | Latin |
| Swahili | swh_Latn | Bantu | Latin |
| Moroccan Arabic | ary_Arab | Semitic | Arabic |
| Somali | som_Latn | Cushitic | Latin |
| Amharic | amh_Ethi | Semitic | Ethiopic |
| Egyptian Arabic | arz_Arab | Semitic | Arabic |
| Hausa | hau_Latn | Chadic | Latin |
| Kinyarwanda | kin_Latn | Bantu | Latin |
| Zulu | zul_Latn | Bantu | Latin |
| Igbo | ibo_Latn | Volta-Niger | Latin |
| Plateau Malagasy | plt_Latn | Austronesian | Latin |
| Xhosa | xho_Latn | Bantu | Latin |
| Shona | sna_Latn | Bantu | Latin |
| Yoruba | yor_Latn | Volta-Niger | Latin |
| Nyanja | nya_Latn | Bantu | Latin |
| Southern Sotho | sot_Latn | Bantu | Latin |
| Tigrinya | tir_Ethi | Semitic | Ethiopic |
| Tunisian Arabic | aeb_Arab | Semitic | Arabic |
| Oromo | gaz_Latn | Cushitic | Latin |
| Tswana | tsn_Latn | Bantu | Latin |
High-resource languages (for catastrophic forgetting mitigation): English, French, Portuguese, Arabic
Training Data
Our training corpus combines multiple high-quality sources:
- African Monolingual Data (~22.8B tokens): FineWeb2, WURA, and MADLAD-400
- Code (~1B tokens): CornStack-Python for reasoning capabilities
- Mathematics (~1B tokens): FineMath-4+ for mathematical understanding
- Synthetic Data (~324M tokens): GPT-4.1 translated domain-specific content across 10 domains
We use UniMax sampling to create a balanced distribution, capping high-resource languages at approximately 1B tokens and upsampling lower-resource languages for up to five epochs.
Quickstart
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "McGill-NLP/AfriqueQwen-14B"
# Load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# Prepare the model input
prompt = "Bawo ni o á¹£e n á¹£e?" # Yoruba: "How are you doing?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
# Generate text
generated_ids = model.generate(
**inputs,
max_new_tokens=100
)
output = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(output)
Deployment
For deployment, you can use vllm or sglang to create an OpenAI-compatible API endpoint:
vLLM:
vllm serve McGill-NLP/AfriqueQwen-14B
SGLang:
python -m sglang.launch_server --model-path McGill-NLP/AfriqueQwen-14B
Training Details
Hyperparameters
- Learning Rate: 5e-5 (with warmup and cosine decay)
- Context Length: 16,384 tokens
- Optimizer: AdamW
- Precision: BF16 mixed precision
Infrastructure
Training was conducted using the LLaMA-Factory framework on up to 64 NVIDIA H100 GPUs with:
- DeepSpeed ZeRO-1/ZeRO-2
- Flash Attention 3
- Sequence packing
- Liger Kernel optimizations
Evaluation
All AfriqueLLM models are evaluated on multiple multilingual benchmarks:
| Model | AfriMGSM | AfriMMLU | AfriXNLI | Belebele | FLORES | INJONG | SIB-200 | Overall | Δ |
|---|---|---|---|---|---|---|---|---|---|
| Gemma3-4B | 10.24 | 33.89 | 37.76 | 45.79 | 29.50 | 55.52 | 63.59 | 39.47 | |
| AfriqueGemma-4B | 14.86 | 36.73 | 39.62 | 50.52 | 57.31 | 69.28 | 69.21 | 48.22 | +8.7 (22.2%) |
| Gemma3-12B | 25.21 | 48.76 | 44.01 | 68.84 | 40.16 | 73.53 | 79.17 | 54.24 | |
| AfriqueGemma-12B | 32.14 | 49.47 | 44.60 | 68.65 | 66.89 | 76.79 | 75.08 | 59.09 | +4.8 (8.9%) |
| Qwen3-8B | 11.22 | 36.56 | 38.24 | 44.63 | 18.93 | 29.47 | 53.06 | 33.16 | |
| AfriqueQwen-8B | 39.68 | 46.91 | 45.99 | 68.46 | 63.54 | 73.36 | 77.00 | 59.28 | +26.1 (78.8%) |
| Qwen3-14B-Base | 16.60 | 39.66 | 43.22 | 50.74 | 20.86 | 41.80 | 66.29 | 39.88 | |
| AfriqueQwen-14B | 45.01 | 52.22 | 49.01 | 74.63 | 65.26 | 77.80 | 82.63 | 63.79 | +23.9 (60.0%) |
| Llama3.1-8B | 8.14 | 32.27 | 37.90 | 40.95 | 23.59 | 41.37 | 59.99 | 34.89 | |
| AfriqueLlama-8B | 17.51 | 36.57 | 37.39 | 50.51 | 64.88 | 71.17 | 69.14 | 49.60 | +14.7 (42.2%) |
Model Variants
- AfriqueQwen-8B - Smaller 8B variant
- AfriqueGemma-4B - Gemma-based 4B model
- AfriqueGemma-12B - Gemma-based 12B model
- AfriqueLlama-8B - Llama-based 8B model
License
This model is released under the CC BY 4.0 License. Please review the license terms before use.
Acknowledgments
We thank the creators of the base models, datasets and compute resources that made this work possible, including Mila, Compute Canada, Microsoft, the FineWeb team, WURA, MADLAD-400 and etc..
- Downloads last month
- 50