AfriqueQwen-14B

Model Overview

AfriqueQwen-14B is the flagship model of the AfriqueLLM suite—a collection of open language models adapted to 20 African languages through continued pre-training (CPT) on 27.5B tokens. This model is based on Qwen/Qwen3-14B-Base and has been specifically adapted for improved performance on African languages while maintaining strong capabilities in high-resource languages.

Our experiments show that Qwen 3 models achieve the best performance among all base models tested, better preserving performance in high-resource languages after CPT and achieving strong results on long-context tasks such as document-level translation.

Key Features

  • Base Model: Qwen 3 14B Base
  • Parameters: 14B
  • Context Length: 32,768 tokens (native)
  • Training Tokens: 27.5B tokens of carefully curated multilingual data

Supported Languages

AfriqueQwen-14B has been adapted for the following 20 African languages plus 4 high-resource languages:

Language Code Family Script
Afrikaans afr_Latn Germanic Latin
Swahili swh_Latn Bantu Latin
Moroccan Arabic ary_Arab Semitic Arabic
Somali som_Latn Cushitic Latin
Amharic amh_Ethi Semitic Ethiopic
Egyptian Arabic arz_Arab Semitic Arabic
Hausa hau_Latn Chadic Latin
Kinyarwanda kin_Latn Bantu Latin
Zulu zul_Latn Bantu Latin
Igbo ibo_Latn Volta-Niger Latin
Plateau Malagasy plt_Latn Austronesian Latin
Xhosa xho_Latn Bantu Latin
Shona sna_Latn Bantu Latin
Yoruba yor_Latn Volta-Niger Latin
Nyanja nya_Latn Bantu Latin
Southern Sotho sot_Latn Bantu Latin
Tigrinya tir_Ethi Semitic Ethiopic
Tunisian Arabic aeb_Arab Semitic Arabic
Oromo gaz_Latn Cushitic Latin
Tswana tsn_Latn Bantu Latin

High-resource languages (for catastrophic forgetting mitigation): English, French, Portuguese, Arabic

Training Data

Our training corpus combines multiple high-quality sources:

  • African Monolingual Data (~22.8B tokens): FineWeb2, WURA, and MADLAD-400
  • Code (~1B tokens): CornStack-Python for reasoning capabilities
  • Mathematics (~1B tokens): FineMath-4+ for mathematical understanding
  • Synthetic Data (~324M tokens): GPT-4.1 translated domain-specific content across 10 domains

We use UniMax sampling to create a balanced distribution, capping high-resource languages at approximately 1B tokens and upsampling lower-resource languages for up to five epochs.

Quickstart

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "McGill-NLP/AfriqueQwen-14B"

# Load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# Prepare the model input
prompt = "Bawo ni o á¹£e n á¹£e?"  # Yoruba: "How are you doing?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate text
generated_ids = model.generate(
    **inputs,
    max_new_tokens=100
)
output = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(output)

Deployment

For deployment, you can use vllm or sglang to create an OpenAI-compatible API endpoint:

vLLM:

vllm serve McGill-NLP/AfriqueQwen-14B

SGLang:

python -m sglang.launch_server --model-path McGill-NLP/AfriqueQwen-14B

Training Details

Hyperparameters

  • Learning Rate: 5e-5 (with warmup and cosine decay)
  • Context Length: 16,384 tokens
  • Optimizer: AdamW
  • Precision: BF16 mixed precision

Infrastructure

Training was conducted using the LLaMA-Factory framework on up to 64 NVIDIA H100 GPUs with:

  • DeepSpeed ZeRO-1/ZeRO-2
  • Flash Attention 3
  • Sequence packing
  • Liger Kernel optimizations

Evaluation

All AfriqueLLM models are evaluated on multiple multilingual benchmarks:

Model AfriMGSM AfriMMLU AfriXNLI Belebele FLORES INJONG SIB-200 Overall Δ
Gemma3-4B 10.24 33.89 37.76 45.79 29.50 55.52 63.59 39.47
AfriqueGemma-4B 14.86 36.73 39.62 50.52 57.31 69.28 69.21 48.22 +8.7 (22.2%)
Gemma3-12B 25.21 48.76 44.01 68.84 40.16 73.53 79.17 54.24
AfriqueGemma-12B 32.14 49.47 44.60 68.65 66.89 76.79 75.08 59.09 +4.8 (8.9%)
Qwen3-8B 11.22 36.56 38.24 44.63 18.93 29.47 53.06 33.16
AfriqueQwen-8B 39.68 46.91 45.99 68.46 63.54 73.36 77.00 59.28 +26.1 (78.8%)
Qwen3-14B-Base 16.60 39.66 43.22 50.74 20.86 41.80 66.29 39.88
AfriqueQwen-14B 45.01 52.22 49.01 74.63 65.26 77.80 82.63 63.79 +23.9 (60.0%)
Llama3.1-8B 8.14 32.27 37.90 40.95 23.59 41.37 59.99 34.89
AfriqueLlama-8B 17.51 36.57 37.39 50.51 64.88 71.17 69.14 49.60 +14.7 (42.2%)

Model Variants

License

This model is released under the CC BY 4.0 License. Please review the license terms before use.

Acknowledgments

We thank the creators of the base models, datasets and compute resources that made this work possible, including Mila, Compute Canada, Microsoft, the FineWeb team, WURA, MADLAD-400 and etc..

Downloads last month
50
Safetensors
Model size
15B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for McGill-NLP/AfriqueQwen-14B

Finetuned
(62)
this model
Quantizations
2 models

Collection including McGill-NLP/AfriqueQwen-14B