CDLI Parakeet TDT 1.1B English Fine-Tune (Uganda + Kenya)
This repository contains a NeMo ASR model fine-tuned from
nvidia/parakeet-tdt-1.1b on a merged atypical-English training mix from
Uganda and Kenya.
The setup mirrors the mixed-country 0.6B notebook, but swaps in the 1.1B
checkpoint and keeps the same evaluation pattern: Uganda-only validation for
selection, then separate held-out reporting for Uganda and Kenya.
Model Details
- Base model:
nvidia/parakeet-tdt-1.1b - Fine-tuning framework: NVIDIA NeMo
- Language: English
- Acoustic model family: FastConformer-TDT / RNNT-BPE
- Output text: lower-case English transcription with standard ASR normalization
Datasets
- Uganda:
cdli/ugandan_english_nonstandard_speech_v1.0 - Kenya:
cdli/kenyan_english_nonstandard_speech_v1.0 - License:
cc-by-sa-4.0 - Audio sampling rate:
16 kHz
Split sizes used in this run:
- Uganda train:
5175 - Uganda validation:
638 - Uganda test:
1016 - Kenya train:
4374 - Kenya validation:
542 - Kenya test:
928 - Merged train total:
9549rows,56.46hours - Validation selection set: Uganda only,
638rows,4.33hours
Training Configuration
- Work root:
/jupyter_kernel/parakeet_cdli_en_ug_ke - Train mix: Uganda + Kenya
- Primary validation country: Uganda
- Max manifest audio length:
40.0 s - Max training audio length:
40.0 s - Min audio length:
0.2 s - Train batch size:
8 - Eval batch size:
8 - Gradient accumulation steps:
8 - Effective train batch size:
64 - Learning rate:
5e-5 - Weight decay:
1e-3 - Warmup steps:
100 - Scheduler:
CosineAnnealing - Max steps:
20000 - Validation interval:
200steps - Early stopping patience:
12 - Precision:
bf16-mixedwhen supported, otherwise mixed precision fallback
Evaluation
Evaluation was run separately on the held-out Uganda and Kenya test splits using both raw transcript comparison and normalized transcript comparison.
Uganda Test Set
- Raw WER:
33.14% - Raw CER:
15.53% - Normalized WER:
21.59% - Normalized CER:
12.92% - Average normalized utterance WER (capped at
1.0):20.86% - Average normalized utterance CER (capped at
1.0):12.88%
Kenya Test Set
- Raw WER:
33.62% - Raw CER:
11.92% - Normalized WER:
12.45% - Normalized CER:
7.59% - Average normalized utterance WER (capped at
1.0):12.48% - Average normalized utterance CER (capped at
1.0):7.63%
Uganda and Kenya Comparison
The same shared 1.1B model performs much better on the Kenyan test split than
on the Ugandan test split.
- Kenya vs Uganda normalized WER:
12.45%vs21.59% - Kenya vs Uganda normalized CER:
7.59%vs12.92% - Absolute normalized WER gap:
9.14points in favor of Kenya
Relative to the mixed-country 0.6B run
(KasuleTrevor/cdli-parakeet-en-ug-ke), the 1.1B model is uneven:
- Uganda normalized WER:
21.59%vs21.33%for0.6B - Kenya normalized WER:
12.45%vs14.56%for0.6B
So the larger model did not improve Uganda, but it improved Kenya materially by
2.11 absolute normalized WER points.
Interpretation
This run is not a clean across-the-board win over the 0.6B model. The main
benefit is stronger Kenyan generalization, while Uganda remains roughly flat to
slightly worse.
That matters for model selection:
- If Uganda remains the primary deployment target, the
0.6Bmixed-country run is still competitive. - If the goal is broader East African English transfer, the
1.1Brun is more interesting because of the Kenya gain.
Usage
from nemo.collections.asr.models import ASRModel
model = ASRModel.from_pretrained("KasuleTrevor/cdli-parakeet-en-ug-ke-1.1b")
predictions = model.transcribe(["path/to/audio.wav"])
print(predictions[0].text if hasattr(predictions[0], "text") else predictions[0])
Files
EN-PARAKEET-TDT-UG-KE-1.1b.nemo: exported NeMo checkpointresults/EN-PARAKEET-TDT-UG-KE-1.1b/: uploaded result tables and breakdown CSVstrain_mix_summary_ug_ke.jsoncountry_summary_ug_ke.csvtest_predictions_scored_uganda.csvtest_predictions_scored_kenya.csvseverity_breakdown_combined_ug_ke.csvdisorder_breakdown_combined_ug_ke.csvetiology_breakdown_combined_ug_ke.csv
Notes
- Validation for early stopping and checkpoint selection was Uganda-only, even though training used both Uganda and Kenya.
- Source datasets are gated. Review the dataset terms before requesting access.
- Result artifacts for both countries were uploaded to the Hub under
results/EN-PARAKEET-TDT-UG-KE-1.1b/.
- Downloads last month
- 461
Datasets used to train KasuleTrevor/cdli-parakeet-en-ug-ke-1.1b
Collection including KasuleTrevor/cdli-parakeet-en-ug-ke-1.1b
Evaluation results
- Test WER (raw) on CDLI Ugandan English Non-Standard Speech v1.0test set self-reported33.140
- Test CER (raw) on CDLI Ugandan English Non-Standard Speech v1.0test set self-reported15.530
- Test WER (normalized) on CDLI Ugandan English Non-Standard Speech v1.0test set self-reported21.590
- Test CER (normalized) on CDLI Ugandan English Non-Standard Speech v1.0test set self-reported12.920
- Test WER (raw) on CDLI Kenyan English Non-Standard Speech v1.0test set self-reported33.620
- Test CER (raw) on CDLI Kenyan English Non-Standard Speech v1.0test set self-reported11.920
- Test WER (normalized) on CDLI Kenyan English Non-Standard Speech v1.0test set self-reported12.450
- Test CER (normalized) on CDLI Kenyan English Non-Standard Speech v1.0test set self-reported7.590