Automatic Speech Recognition
Transformers
Safetensors
VibeVoice
ASR
Transcriptoin
Diarization
Speech-to-Text
Instructions to use microsoft/VibeVoice-ASR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/VibeVoice-ASR with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="microsoft/VibeVoice-ASR")# Load model directly from transformers import VibeVoiceForASRTraining model = VibeVoiceForASRTraining.from_pretrained("microsoft/VibeVoice-ASR", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Add technical report
#11
by bezzam HF Staff - opened
README.md
CHANGED
|
@@ -19,6 +19,7 @@ library_name: transformers
|
|
| 19 |
|
| 20 |
**VibeVoice-ASR** is a unified speech-to-text model designed to handle **60-minute long-form audio** in a single pass, generating structured transcriptions containing **Who (Speaker), When (Timestamps), and What (Content)**, with support for **Customized Hotwords**.
|
| 21 |
|
|
|
|
| 22 |
➡️ **Code:** [microsoft/VibeVoice](https://github.com/microsoft/VibeVoice)<br>
|
| 23 |
➡️ **Demo:** [VibeVoice-ASR-Demo](https://aka.ms/vibevoice-asr)
|
| 24 |
|
|
|
|
| 19 |
|
| 20 |
**VibeVoice-ASR** is a unified speech-to-text model designed to handle **60-minute long-form audio** in a single pass, generating structured transcriptions containing **Who (Speaker), When (Timestamps), and What (Content)**, with support for **Customized Hotwords**.
|
| 21 |
|
| 22 |
+
➡️ **Technical Report:** [VibeVoice ASR Technical Report](https://huggingface.co/papers/2601.18184)<br>
|
| 23 |
➡️ **Code:** [microsoft/VibeVoice](https://github.com/microsoft/VibeVoice)<br>
|
| 24 |
➡️ **Demo:** [VibeVoice-ASR-Demo](https://aka.ms/vibevoice-asr)
|
| 25 |
|