microsoft/NOTSOFAR
Updated β’ 10.8k β’ 19
How to use BUT-FIT/Dixtral_QA with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="BUT-FIT/Dixtral_QA", trust_remote_code=True) # Load model directly
from transformers import AutoProcessor, AutoModel
processor = AutoProcessor.from_pretrained("BUT-FIT/Dixtral_QA", trust_remote_code=True)
model = AutoModel.from_pretrained("BUT-FIT/Dixtral_QA", trust_remote_code=True)This repository hosts Dixtral_QA, developed by BUT Speech@FIT. Dixtral couples the Voxtral-Mini-3B spoken-language model with the DiCoW diarization-conditioned encoder, giving the LLM target-speaker awareness in multi-talker audio.
This checkpoint is tuned for spoken question answering over conversational/meeting audio. For pure target-speaker transcription, use Dixtral_TS-ASR instead.
from transformers import AutoModel, AutoProcessor
MODEL_NAME = "BUT-FIT/Dixtral_QA"
model = AutoModel.from_pretrained(MODEL_NAME, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(MODEL_NAME)
β‘οΈ For full inference pipelines (diarization β FDDT masks β generation), see the Dixtral GitHub repository.
π§ Email: ipoloka@fit.vut.cz π’ Affiliation: BUT Speech@FIT, Brno University of Technology π GitHub: BUTSpeechFIT
Base model
mistralai/Voxtral-Mini-3B-2507