CDLI Parakeet TDT 1.1B English Fine-Tune (Uganda + Kenya)

Model architecture Base model Language

This repository contains a NeMo ASR model fine-tuned from nvidia/parakeet-tdt-1.1b on a merged atypical-English training mix from Uganda and Kenya.

The setup mirrors the mixed-country 0.6B notebook, but swaps in the 1.1B checkpoint and keeps the same evaluation pattern: Uganda-only validation for selection, then separate held-out reporting for Uganda and Kenya.

Model Details

  • Base model: nvidia/parakeet-tdt-1.1b
  • Fine-tuning framework: NVIDIA NeMo
  • Language: English
  • Acoustic model family: FastConformer-TDT / RNNT-BPE
  • Output text: lower-case English transcription with standard ASR normalization

Datasets

  • Uganda: cdli/ugandan_english_nonstandard_speech_v1.0
  • Kenya: cdli/kenyan_english_nonstandard_speech_v1.0
  • License: cc-by-sa-4.0
  • Audio sampling rate: 16 kHz

Split sizes used in this run:

  • Uganda train: 5175
  • Uganda validation: 638
  • Uganda test: 1016
  • Kenya train: 4374
  • Kenya validation: 542
  • Kenya test: 928
  • Merged train total: 9549 rows, 56.46 hours
  • Validation selection set: Uganda only, 638 rows, 4.33 hours

Training Configuration

  • Work root: /jupyter_kernel/parakeet_cdli_en_ug_ke
  • Train mix: Uganda + Kenya
  • Primary validation country: Uganda
  • Max manifest audio length: 40.0 s
  • Max training audio length: 40.0 s
  • Min audio length: 0.2 s
  • Train batch size: 8
  • Eval batch size: 8
  • Gradient accumulation steps: 8
  • Effective train batch size: 64
  • Learning rate: 5e-5
  • Weight decay: 1e-3
  • Warmup steps: 100
  • Scheduler: CosineAnnealing
  • Max steps: 20000
  • Validation interval: 200 steps
  • Early stopping patience: 12
  • Precision: bf16-mixed when supported, otherwise mixed precision fallback

Evaluation

Evaluation was run separately on the held-out Uganda and Kenya test splits using both raw transcript comparison and normalized transcript comparison.

Uganda Test Set

  • Raw WER: 33.14%
  • Raw CER: 15.53%
  • Normalized WER: 21.59%
  • Normalized CER: 12.92%
  • Average normalized utterance WER (capped at 1.0): 20.86%
  • Average normalized utterance CER (capped at 1.0): 12.88%

Kenya Test Set

  • Raw WER: 33.62%
  • Raw CER: 11.92%
  • Normalized WER: 12.45%
  • Normalized CER: 7.59%
  • Average normalized utterance WER (capped at 1.0): 12.48%
  • Average normalized utterance CER (capped at 1.0): 7.63%

Uganda and Kenya Comparison

The same shared 1.1B model performs much better on the Kenyan test split than on the Ugandan test split.

  • Kenya vs Uganda normalized WER: 12.45% vs 21.59%
  • Kenya vs Uganda normalized CER: 7.59% vs 12.92%
  • Absolute normalized WER gap: 9.14 points in favor of Kenya

Relative to the mixed-country 0.6B run (KasuleTrevor/cdli-parakeet-en-ug-ke), the 1.1B model is uneven:

  • Uganda normalized WER: 21.59% vs 21.33% for 0.6B
  • Kenya normalized WER: 12.45% vs 14.56% for 0.6B

So the larger model did not improve Uganda, but it improved Kenya materially by 2.11 absolute normalized WER points.

Interpretation

This run is not a clean across-the-board win over the 0.6B model. The main benefit is stronger Kenyan generalization, while Uganda remains roughly flat to slightly worse.

That matters for model selection:

  • If Uganda remains the primary deployment target, the 0.6B mixed-country run is still competitive.
  • If the goal is broader East African English transfer, the 1.1B run is more interesting because of the Kenya gain.

Usage

from nemo.collections.asr.models import ASRModel

model = ASRModel.from_pretrained("KasuleTrevor/cdli-parakeet-en-ug-ke-1.1b")
predictions = model.transcribe(["path/to/audio.wav"])
print(predictions[0].text if hasattr(predictions[0], "text") else predictions[0])

Files

  • EN-PARAKEET-TDT-UG-KE-1.1b.nemo: exported NeMo checkpoint
  • results/EN-PARAKEET-TDT-UG-KE-1.1b/: uploaded result tables and breakdown CSVs
  • train_mix_summary_ug_ke.json
  • country_summary_ug_ke.csv
  • test_predictions_scored_uganda.csv
  • test_predictions_scored_kenya.csv
  • severity_breakdown_combined_ug_ke.csv
  • disorder_breakdown_combined_ug_ke.csv
  • etiology_breakdown_combined_ug_ke.csv

Notes

  • Validation for early stopping and checkpoint selection was Uganda-only, even though training used both Uganda and Kenya.
  • Source datasets are gated. Review the dataset terms before requesting access.
  • Result artifacts for both countries were uploaded to the Hub under results/EN-PARAKEET-TDT-UG-KE-1.1b/.
Downloads last month
461
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train KasuleTrevor/cdli-parakeet-en-ug-ke-1.1b

Collection including KasuleTrevor/cdli-parakeet-en-ug-ke-1.1b

Evaluation results