hf-internal-testing/librispeech_asr_demo
Viewer • Updated • 73 • 6.14k • 3
How to use suiyaradant/whisper-large-v3 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="suiyaradant/whisper-large-v3") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("suiyaradant/whisper-large-v3", dtype="auto")This is a custom packaged version of OpenAI's Whisper Large V3 model, converted to safetensors format for safer and faster loading. This repo includes both the model weights and tokenizer files required for ASR (Automatic Speech Recognition) tasks.
model.safetensors: Model weights in safetensors formattokenizer_config.json: Tokenizer configurationvocab.json: Vocabulary filemerges.txt: BPE mergesspecial_tokens_map.json: Special token mappingfrom transformers import WhisperForConditionalGeneration, WhisperTokenizer
model = WhisperForConditionalGeneration.from_pretrained("Zvatlov/whisper-large-v3")
tokenizer = WhisperTokenizer.from_pretrained("Zvatlov/whisper-large-v3")
import torch
from transformers import WhisperProcessor, WhisperForConditionalGeneration
processor = WhisperProcessor.from_pretrained("Zvatlov/whisper-large-v3")
model = WhisperForConditionalGeneration.from_pretrained("Zvatlov/whisper-large-v3")
# Load audio
from datasets import load_dataset
ds = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
input_audio = ds[0]["audio"]["array"]
# Prepare input
inputs = processor(input_audio, return_tensors="pt")
with torch.no_grad():
generated_ids = model.generate(inputs["input_features"])
# Decode output
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)
print(transcription[0])
FP16Base model
openai/whisper-large-v3