microsoft
/

VibeVoice-ASR

Automatic Speech Recognition

Model card Files Files and versions

Add technical report

#11

by bezzam HF Staff - opened Jan 27

base: refs/heads/main

←

from: refs/pr/11

Discussion Files changed

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -19,6 +19,7 @@ library_name: transformers
 **VibeVoice-ASR** is a unified speech-to-text model designed to handle **60-minute long-form audio** in a single pass, generating structured transcriptions containing **Who (Speaker), When (Timestamps), and What (Content)**, with support for **Customized Hotwords**.
 ➡️ **Code:** [microsoft/VibeVoice](https://github.com/microsoft/VibeVoice)<br>
 ➡️ **Demo:** [VibeVoice-ASR-Demo](https://aka.ms/vibevoice-asr)

 **VibeVoice-ASR** is a unified speech-to-text model designed to handle **60-minute long-form audio** in a single pass, generating structured transcriptions containing **Who (Speaker), When (Timestamps), and What (Content)**, with support for **Customized Hotwords**.
+➡️ **Technical Report:** [VibeVoice ASR Technical Report](https://huggingface.co/papers/2601.18184)<br>
 ➡️ **Code:** [microsoft/VibeVoice](https://github.com/microsoft/VibeVoice)<br>
 ➡️ **Demo:** [VibeVoice-ASR-Demo](https://aka.ms/vibevoice-asr)