xlm-roberta-multilingual-emotion-classifier

Model Description

This model is a fine‑tuned version of XLM‑RoBERTa‑base for 7‑class emotion detection in code‑mixed Pakistani text, including English, Roman Urdu, and Urdu script. It is designed for social media analysis, customer feedback, and multilingual NLP applications in the Pakistani context.

  • Base Model: xlm-roberta-base
  • Emotion Classes: anger, disgust, fear, joy, neutral, sad, surprise
  • Language Support: English, Roman Urdu, Urdu script

Training Data & Balancing Strategy

Data Sources

The training dataset was compiled from multiple sources:

Source Description
GoEmotions Subset of the Google emotion dataset
Parul Pandey’s Emotion Dataset Primary source (most samples)
Roman Urdu‑English Code‑Switched Dataset Specialised code‑mixed data
LLM‑generated samples (GPT, Grok, Gemini) Used for English‑Urdu translation and augmentation

Cleaning & Preprocessing

  • Duplicate rows removed
  • Standardised column structure across three languages (English, Urdu, Roman Urdu)
  • Seven emotion classes retained
  • Light spelling normalisation for Roman Urdu

Class Imbalance & Balancing Strategy

Initial raw data showed significant imbalance:

  • English: Fear and Sad dominated (14-16), Disgust very low (7.1%).
  • Roman Urdu: Fear as low as 8.1%, Sad relatively low.
  • Urdu: More balanced but Sad often lowest.

Targeted Augmentation (Weak Classes)

We augmented Sad, Disgust, Surprise (and Fear/Joy where needed per language) using:

  • Random Swap (EDA) for Urdu and Roman Urdu – swaps two random words, preserving sentiment while increasing structural variety.
  • Oversampling + Cross‑Lingual Injection for English Disgust – duplicated high‑quality samples and added code‑switched (English + Roman Urdu) variants.
  • LLM‑based translation (GPT, Grok, Gemini) to generate parallel English‑Urdu/Roman Urdu versions.

Approximate additional samples added:

  • Total synthetic/augmented samples: ~2,000–2,500
  • English Disgust: +334 samples (~30% of that class augmented)
  • Urdu/Roman Urdu weak classes (Surprise, Disgust): +400–500 samples each

Majority Class Control (Mild Downsampling)

We randomly under‑sampled the majority English classes (Anger, Fear, Sad) to prevent bias:

  • Original counts: ~1,400–1,600 each
  • Reduced by ~200–400 samples to a ceiling of ~1,200 each

Quality Controls

  • Augmentation applied to the master dataset before train/test split (so test set also contains augmented variants, proving model robustness).
  • Stratified split ensures balanced representation across languages and emotions.
  • Manual spot‑check performed on augmented Roman Urdu samples.

Final Dataset Statistics

Total rows: 20,235
Languages: English (7,734), Roman Urdu (5,572), Urdu (6,929)

Per‑language emotion counts (exact numbers used in training):

Language anger disgust fear joy neutral sad surprise Total
English 1128 1106 1207 1200 1004 1207 882 7734
Roman Urdu 839 791 839 839 839 691 734 5572
Urdu 1009 935 1059 980 948 1059 939 6929
Total 2976 2832 3105 3019 2791 2957 2555 20235

Global percentages:

  • Highest: Fear (~15.34%)
  • Lowest: Surprise (~12.63%)
  • Others: 13.8% – 14.9%

The final dataset is much more balanced than raw sources, with all classes represented across all three languages.

Training Details

  • Framework: PyTorch + Hugging Face Transformers
  • Model: XLM‑RoBERTa‑base (12 layers, 278M parameters)
  • Optimizer: AdamW (learning rate = 2e‑5)
  • Batch size: 32
  • Epochs: 5 (early stopping patience = 2)
  • Max sequence length: 128 tokens
  • Warmup steps: 10% of total steps
  • Weight decay: 0.01
  • FP16: Enabled (GPU acceleration)

Evaluation Results

The model was evaluated on a held‑out test set of 4,047 samples (20% of total data, stratified).

Overall Performance

  • Accuracy: 83.4%
  • Macro F1‑score: 0.835
  • Weighted F1‑score: 0.833

Per‑Class Performance

Emotion Precision Recall F1‑score Support
anger 0.83 0.87 0.85 595
disgust 0.90 0.93 0.91 567
fear 0.93 0.90 0.92 621
joy 0.82 0.79 0.81 604
neutral 0.77 0.75 0.76 558
sad 0.75 0.75 0.75 591
surprise 0.80 0.81 0.81 511

Confusion Matrix (Actual vs. Predicted)

Actual ↓ / Predicted → anger disgust fear joy neutral sad surprise
anger 518 10 13 10 12 26 6
disgust 9 526 2 0 13 8 9
fear 7 3 560 0 0 2 49
joy 6 3 0 480 54 38 23
neutral 9 19 2 39 418 64 7
sad 52 23 2 29 34 443 8
surprise 21 2 23 30 13 8 414

Per‑Language Accuracy

Language Accuracy Samples
English 84.5% 851
Roman Urdu 80.8% 506
Urdu 84.0% 681
Downloads last month
240
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support