CDLI Parakeet TDT 1.1B English Fine-Tune (Uganda + Kenya)

This repository contains a NeMo ASR model fine-tuned from nvidia/parakeet-tdt-1.1b on a merged atypical-English training mix from Uganda and Kenya.

The setup mirrors the mixed-country 0.6B notebook, but swaps in the 1.1B checkpoint and keeps the same evaluation pattern: Uganda-only validation for selection, then separate held-out reporting for Uganda and Kenya.

Model Details

Base model: nvidia/parakeet-tdt-1.1b
Fine-tuning framework: NVIDIA NeMo
Language: English
Acoustic model family: FastConformer-TDT / RNNT-BPE
Output text: lower-case English transcription with standard ASR normalization

Datasets

Uganda: cdli/ugandan_english_nonstandard_speech_v1.0
Kenya: cdli/kenyan_english_nonstandard_speech_v1.0
License: cc-by-sa-4.0
Audio sampling rate: 16 kHz

Split sizes used in this run:

Uganda train: 5175
Uganda validation: 638
Uganda test: 1016
Kenya train: 4374
Kenya validation: 542
Kenya test: 928
Merged train total: 9549 rows, 56.46 hours
Validation selection set: Uganda only, 638 rows, 4.33 hours

Training Configuration

Work root: /jupyter_kernel/parakeet_cdli_en_ug_ke
Train mix: Uganda + Kenya
Primary validation country: Uganda
Max manifest audio length: 40.0 s
Max training audio length: 40.0 s
Min audio length: 0.2 s
Train batch size: 8
Eval batch size: 8
Gradient accumulation steps: 8
Effective train batch size: 64
Learning rate: 5e-5
Weight decay: 1e-3
Warmup steps: 100
Scheduler: CosineAnnealing
Max steps: 20000
Validation interval: 200 steps
Early stopping patience: 12
Precision: bf16-mixed when supported, otherwise mixed precision fallback

Evaluation

Evaluation was run separately on the held-out Uganda and Kenya test splits using both raw transcript comparison and normalized transcript comparison.

Uganda Test Set

Raw WER: 33.14%
Raw CER: 15.53%
Normalized WER: 21.59%
Normalized CER: 12.92%
Average normalized utterance WER (capped at 1.0): 20.86%
Average normalized utterance CER (capped at 1.0): 12.88%

Kenya Test Set

Raw WER: 33.62%
Raw CER: 11.92%
Normalized WER: 12.45%
Normalized CER: 7.59%
Average normalized utterance WER (capped at 1.0): 12.48%
Average normalized utterance CER (capped at 1.0): 7.63%

Uganda and Kenya Comparison

The same shared 1.1B model performs much better on the Kenyan test split than on the Ugandan test split.

Kenya vs Uganda normalized WER: 12.45% vs 21.59%
Kenya vs Uganda normalized CER: 7.59% vs 12.92%
Absolute normalized WER gap: 9.14 points in favor of Kenya

Relative to the mixed-country 0.6B run (KasuleTrevor/cdli-parakeet-en-ug-ke), the 1.1B model is uneven:

Uganda normalized WER: 21.59% vs 21.33% for 0.6B
Kenya normalized WER: 12.45% vs 14.56% for 0.6B

So the larger model did not improve Uganda, but it improved Kenya materially by 2.11 absolute normalized WER points.

Interpretation

This run is not a clean across-the-board win over the 0.6B model. The main benefit is stronger Kenyan generalization, while Uganda remains roughly flat to slightly worse.

That matters for model selection:

If Uganda remains the primary deployment target, the 0.6B mixed-country run is still competitive.
If the goal is broader East African English transfer, the 1.1B run is more interesting because of the Kenya gain.

Usage

from nemo.collections.asr.models import ASRModel

model = ASRModel.from_pretrained("KasuleTrevor/cdli-parakeet-en-ug-ke-1.1b")
predictions = model.transcribe(["path/to/audio.wav"])
print(predictions[0].text if hasattr(predictions[0], "text") else predictions[0])

Files

EN-PARAKEET-TDT-UG-KE-1.1b.nemo: exported NeMo checkpoint
results/EN-PARAKEET-TDT-UG-KE-1.1b/: uploaded result tables and breakdown CSVs
train_mix_summary_ug_ke.json
country_summary_ug_ke.csv
test_predictions_scored_uganda.csv
test_predictions_scored_kenya.csv
severity_breakdown_combined_ug_ke.csv
disorder_breakdown_combined_ug_ke.csv
etiology_breakdown_combined_ug_ke.csv

Notes

Validation for early stopping and checkpoint selection was Uganda-only, even though training used both Uganda and Kenya.
Source datasets are gated. Review the dataset terms before requesting access.
Result artifacts for both countries were uploaded to the Hub under results/EN-PARAKEET-TDT-UG-KE-1.1b/.

Downloads last month: 461

Datasets used to train KasuleTrevor/cdli-parakeet-en-ug-ke-1.1b

Collection including KasuleTrevor/cdli-parakeet-en-ug-ke-1.1b

CDLI

Collection

This is a collection of models used for the CDLI ASR challenge for atypical speech in Uganda on Ugandan English and Luganda. • 21 items • Updated 7 days ago

Evaluation results

Test WER (raw) on CDLI Ugandan English Non-Standard Speech v1.0
test set self-reported

33.140
Test CER (raw) on CDLI Ugandan English Non-Standard Speech v1.0
test set self-reported

15.530
Test WER (normalized) on CDLI Ugandan English Non-Standard Speech v1.0
test set self-reported

21.590
Test CER (normalized) on CDLI Ugandan English Non-Standard Speech v1.0
test set self-reported

12.920
Test WER (raw) on CDLI Kenyan English Non-Standard Speech v1.0
test set self-reported

33.620
Test CER (raw) on CDLI Kenyan English Non-Standard Speech v1.0
test set self-reported

11.920
Test WER (normalized) on CDLI Kenyan English Non-Standard Speech v1.0
test set self-reported

12.450
Test CER (normalized) on CDLI Kenyan English Non-Standard Speech v1.0
test set self-reported

7.590