You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Multi-lingual NVASR

Multi-lingual Nonverbal Vocalization Automatic Speech Recognition

Demo Page Dataset Model

Multi-lingual NVASR is a speech recognition model fine-tuned from SenseVoice-Small for transcribing both regular speech and nonverbal vocalizations (NVVs) with a unified paralinguistic label taxonomy. It is a core component of the NV-Bench evaluation pipeline.

Highlights

  • πŸ—£οΈ Multi-lingual Support β€” Chinese (zh), English (en)
  • 🎯 NVV-Aware Transcription β€” Accurately transcribes nonverbal vocalizations (laughter, coughs, sighs, etc.) as structured tags within text
  • πŸ“Š High-Quality General ASR β€” Maintains competitive CER on standard ASR benchmarks while significantly outperforming baselines on NVV-specific tasks
  • 🏷️ Unified Label Taxonomy β€” Consistent paralinguistic labels across all supported languages

NVV Taxonomy

NVVs are organized into three functional levels:

Function Categories
Vegetative [Cough], [Sigh], [Breathing]
Affect Burst [Surprise-oh], [Surprise-ah], [Dissatisfaction-hnn], [Laughter]
Conversational Grunt [Uhm], [Question-en/oh/ah/ei/huh], [Confirmation-en]

Mandarin supports 13 NVV categories; English supports 7 categories.

Usage

Quick Start with FunASR

from funasr import AutoModel

model = AutoModel(model="path/to/Multi-lingual-NVASR")

# Single file inference
res = model.generate(
    input="example/zh.mp3",
    language="auto",
    use_itn=True,
)
print(res[0]["text"])

Evaluation Metrics

Multi-lingual NVASR supports the following evaluation metrics used in the NV-Bench pipeline:

Metric Description
OCER / OWER Overall Character/Word Error Rate (text + NVV tags)
PCER / PWER Paralinguistic CER/WER (NVV tags only)
CER / WER Text-only error rate (NVV tags removed)

Our NVASR model maintains high-quality general ASR while significantly outperforming baselines on NVV-specific tasks. β€” NV-Bench

File Structure

Multi-lingual NVASR/
β”œβ”€β”€ model.pt                        # Model weights (~2.8 GB)
β”œβ”€β”€ config.yaml                     # Model architecture configuration
β”œβ”€β”€ configuration.json              # FunASR pipeline configuration
β”œβ”€β”€ am.mvn                          # Acoustic model mean-variance normalization
β”œβ”€β”€ paralingustic_tokenizer.model   # SentencePiece tokenizer with NVV vocabulary
β”œβ”€β”€ example/                        # Example audio files
β”‚   β”œβ”€β”€ zh.mp3                      # Chinese example
β”‚   β”œβ”€β”€ en.mp3                      # English example

Related Resources

Citation

If you use this model, please cite:

Coming soon

License

This project is licensed under the CC BY-NC-4.0 License.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for AnonyData/Multilingual-NVASR

Finetuned
(6)
this model

Datasets used to train AnonyData/Multilingual-NVASR