ov-whisper_large_v3_turbo-int8-2026.0.0
openai/whisper-large-v3-turbo exported to OpenVINO IR with INT8 asymmetric weight compression (group size 128).
The model layout targets openvino_genai.WhisperPipeline and includes stateful decoder (-with-past), tokenizer, and detokenizer.
Whisper large-v3-turbo is a distilled version of Whisper large-v3 that is 6x faster with minimal quality loss. It supports 99 languages.
Quantization details
| Parameter | Value |
|---|---|
| Source model | openai/whisper-large-v3-turbo |
| Weight format | INT8 asymmetric (per-channel) |
| Group size | 128 |
| Encoder layers compressed | 194 / 194 (100%) |
| Decoder layers compressed | 42 / 42 (100%) |
| Task | automatic-speech-recognition-with-past |
Toolchain
| Package | Version |
|---|---|
| Python | 3.11.9 |
| openvino | 2026.0.0 |
| openvino-genai | 2026.0.0.0 |
| openvino-tokenizers | 2026.0.0.0 |
| optimum-intel | 1.27.0 |
| optimum | 2.1.0 |
| nncf | 3.0.0 |
| transformers | 4.57.6 |
| torch | 2.11.0 |
Usage
import numpy as np
import openvino_genai as ov_genai
pipe = ov_genai.WhisperPipeline("ov-whisper_large_v3_turbo-int8-2026.0.0", "CPU")
# Load audio as 16 kHz float32 mono (e.g. via librosa)
import librosa
samples, _ = librosa.load("audio.wav", sr=16000, mono=True)
samples = np.asarray(samples, dtype=np.float32)
result = pipe.generate(samples)
print(result.text)
Supported devices: CPU, GPU, NPU (tested on Intel Core Ultra 7 255H / Arc 140T / AI Boost).
Reproduce the export
pip install -r requirements.txt
python export_whisper_int8_ov.py \
--model openai/whisper-large-v3-turbo \
--output ov-whisper_large_v3_turbo-int8-2026.0.0 \
--cache-dir ./cache_dir
Or equivalently with optimum-cli directly:
optimum-cli export openvino \
-m openai/whisper-large-v3-turbo \
--task automatic-speech-recognition-with-past \
--weight-format int8 \
--group-size 128 \
ov-whisper_large_v3_turbo-int8-2026.0.0
Validate
python validate_whisper_genai.py ov-whisper_large_v3_turbo-int8-2026.0.0 --device CPU
Files
openvino_encoder_model.bin/.xml-- Whisper encoder (INT8)openvino_decoder_model.bin/.xml-- Whisper decoder with past/beam_idx (INT8)openvino_tokenizer.bin/.xml-- Tokenizeropenvino_detokenizer.bin/.xml-- Detokenizerconfig.json,generation_config.json-- Model configurationtokenizer.json,vocab.json,merges.txt-- Tokenizer dataexport_whisper_int8_ov.py-- Export script used to produce this modelvalidate_whisper_genai.py-- Smoke-test scriptrequirements.txt-- Pinned Python dependencies
- Downloads last month
- 22
Model tree for hlevring/ov-whisper_large_v3_turbo-int8-2026.0.0
Base model
openai/whisper-large-v3 Finetuned
openai/whisper-large-v3-turbo