Instructions to use nvidia/parakeet-tdt-0.6b-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nvidia/parakeet-tdt-0.6b-v3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="nvidia/parakeet-tdt-0.6b-v3")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("nvidia/parakeet-tdt-0.6b-v3", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Add eos_token to tokenizer implicitly to help downstream vllm integrate
Summary
Declare the Parakeet TDT end-of-transcription token in the standard Hugging Face tokenizer and generation config metadata.
This updates:
tokenizer_config.jsonto mark<|endoftext|>as the tokenizer EOS tokengeneration_config.jsonto seteos_token_idto3
Rationale
The Parakeet tokenizer already contains <|endoftext|> at token id 3:
tokenizer.convert_tokens_to_ids("<|endoftext|>") == 3
tokenizer.decode([3]) == "<|endoftext|>"
The model also emits token id 3 as the terminal marker during TDT decoding. However, the current repository metadata does not expose that token as EOS:
tokenizer.eos_token_id is None
GenerationConfig.from_pretrained(...).eos_token_id is None
Downstream runtimes that rely on standard Hugging Face metadata therefore cannot discover the model’s stop token from either the tokenizer config or generation_config.json.
Most ASR / speech-generation checkpoints expose this metadata through the standard files. For example, Whisper, Qwen3-ASR, Granite Speech, Fun-ASR, and FireRed ASR/LID publish eos_token_id in generation_config.json, and usually also declare the corresponding tokenizer EOS token.
Adding this metadata makes Parakeet consistent with those checkpoints and avoids downstream framework-specific workarounds.
Compatibility
This does not change tokenizer vocabulary or model weights. It only declares existing semantics:
<|endoftext|>already exists in the tokenizer vocabulary- token id 3 already decodes to
<|endoftext|> - token id 3 is already used as the model’s end marker
Expected behavior after the change:
tokenizer.eos_token == "<|endoftext|>"
tokenizer.eos_token_id == 3
GenerationConfig.from_pretrained(...).eos_token_id == 3
Downstream impact
This helps runtimes such as vLLM, Transformers-based serving stacks, and OpenAI-compatible transcription servers stop generation cleanly without adding Parakeet-specific stop-token workarounds. This is one vllm integration PR https://github.com/vllm-project/vllm/pull/41708. When this PR gets merged the vllm integration will be simpler.