You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

CDLI Parakeet TDT 1.1B English Fine-Tune (`lr=5e-5`)

This repository contains a NeMo ASR model fine-tuned from nvidia/parakeet-tdt-1.1b on the gated cdli/ugandan_english_nonstandard_speech_v1.0 dataset.

This card documents the stronger 1.1B recovery run using a lower learning rate (5e-5) after the earlier 1e-4 run plateaued early.

Model Details

Base model: nvidia/parakeet-tdt-1.1b
Fine-tuning framework: NVIDIA NeMo
Language: English
Acoustic model family: FastConformer-TDT / RNNT-BPE

Dataset

Dataset: cdli/ugandan_english_nonstandard_speech_v1.0
License: cc-by-sa-4.0
Split sizes used by the source dataset card:
- train: 5176
- validation: 638
- test: 1017

The evaluation artifacts in this run contain 1016 scored rows.

Training Configuration

Work root: /jupyter_kernel/parakeet_cdli_en_5e5
Base checkpoint: nvidia/parakeet-tdt-1.1b
Max manifest audio length: 40.0 s
Max training audio length: 30.0 s
Min audio length: 0.2 s
Train batch size: 4
Eval batch size: 8
Gradient accumulation steps: 8
Effective train batch size: 32
Learning rate: 5e-5
Weight decay: 1e-3
Warmup steps: 100
Scheduler: CosineAnnealing
Max steps configured: 20000
Early stopping patience: 10

Evaluation

Evaluation was run on the held-out test split using both raw and normalized transcript comparison.

Corpus Metrics

Raw WER: 31.57%
Raw CER: 15.09%
Normalized WER: 21.20%
Normalized CER: 12.56%

Average Utterance Metrics

Average normalized utterance WER (capped at 1.0): 20.70%
Average normalized utterance CER (capped at 1.0): 12.58%

Files

EN-PARAKEET-TDT-F1tdt-1-1b.nemo: exported NeMo checkpoint
checkpoints/: intermediate training checkpoints
test_predictions.csv
test_predictions.jsonl
test_predictions_scored.csv
test_predictions_scored.jsonl
test_predictions_grouped_analysis.csv

Notes

This 5e-5 run improved substantially over the earlier 1.1B 1e-4 run.
Access to the source dataset is gated. Review the dataset terms before requesting access.

Downloads last month: 31

Dataset used to train KasuleTrevor/cdli-parakeet-11b-en-finetune

Collection including KasuleTrevor/cdli-parakeet-11b-en-finetune

CDLI

Collection

This is a collection of models used for the CDLI ASR challenge for atypical speech in Uganda on Ugandan English and Luganda. • 21 items • Updated 8 days ago

Evaluation results

Test WER (raw) on CDLI Ugandan English Non-Standard Speech v1.0
test set self-reported

31.570
Test CER (raw) on CDLI Ugandan English Non-Standard Speech v1.0
test set self-reported

15.090
Test WER (normalized) on CDLI Ugandan English Non-Standard Speech v1.0
test set self-reported

21.200
Test CER (normalized) on CDLI Ugandan English Non-Standard Speech v1.0
test set self-reported

12.560