Multi-lingual NVASR
Multi-lingual Nonverbal Vocalization Automatic Speech Recognition
Multi-lingual NVASR is a speech recognition model fine-tuned from SenseVoice-Small for transcribing both regular speech and nonverbal vocalizations (NVVs) with a unified paralinguistic label taxonomy. It is a core component of the NV-Bench evaluation pipeline.
Highlights
- π£οΈ Multi-lingual Support β Chinese (zh), English (en)
- π― NVV-Aware Transcription β Accurately transcribes nonverbal vocalizations (laughter, coughs, sighs, etc.) as structured tags within text
- π High-Quality General ASR β Maintains competitive CER on standard ASR benchmarks while significantly outperforming baselines on NVV-specific tasks
- π·οΈ Unified Label Taxonomy β Consistent paralinguistic labels across all supported languages
NVV Taxonomy
NVVs are organized into three functional levels:
| Function | Categories |
|---|---|
| Vegetative | [Cough], [Sigh], [Breathing] |
| Affect Burst | [Surprise-oh], [Surprise-ah], [Dissatisfaction-hnn], [Laughter] |
| Conversational Grunt | [Uhm], [Question-en/oh/ah/ei/huh], [Confirmation-en] |
Mandarin supports 13 NVV categories; English supports 7 categories.
Usage
Quick Start with FunASR
from funasr import AutoModel
model = AutoModel(model="path/to/Multi-lingual-NVASR")
# Single file inference
res = model.generate(
input="example/zh.mp3",
language="auto",
use_itn=True,
)
print(res[0]["text"])
Evaluation Metrics
Multi-lingual NVASR supports the following evaluation metrics used in the NV-Bench pipeline:
| Metric | Description |
|---|---|
| OCER / OWER | Overall Character/Word Error Rate (text + NVV tags) |
| PCER / PWER | Paralinguistic CER/WER (NVV tags only) |
| CER / WER | Text-only error rate (NVV tags removed) |
Our NVASR model maintains high-quality general ASR while significantly outperforming baselines on NVV-specific tasks. β NV-Bench
File Structure
Multi-lingual NVASR/
βββ model.pt # Model weights (~2.8 GB)
βββ config.yaml # Model architecture configuration
βββ configuration.json # FunASR pipeline configuration
βββ am.mvn # Acoustic model mean-variance normalization
βββ paralingustic_tokenizer.model # SentencePiece tokenizer with NVV vocabulary
βββ example/ # Example audio files
β βββ zh.mp3 # Chinese example
β βββ en.mp3 # English example
Related Resources
- NV-Bench Project Page: https://nvbench.github.io
- NV-Bench Dataset: Hugging Face
- SenseVoice: GitHub
Citation
If you use this model, please cite:
Coming soon
License
This project is licensed under the CC BY-NC-4.0 License.
- Downloads last month
- -
Model tree for AnonyData/Multilingual-NVASR
Base model
FunAudioLLM/SenseVoiceSmall