ov-whisper_large_v3_turbo-int8-2026.0.0

openai/whisper-large-v3-turbo exported to OpenVINO IR with INT8 asymmetric weight compression (group size 128).

The model layout targets openvino_genai.WhisperPipeline and includes stateful decoder (-with-past), tokenizer, and detokenizer.

Whisper large-v3-turbo is a distilled version of Whisper large-v3 that is 6x faster with minimal quality loss. It supports 99 languages.

Quantization details

Parameter	Value
Source model	`openai/whisper-large-v3-turbo`
Weight format	INT8 asymmetric (per-channel)
Group size	128
Encoder layers compressed	194 / 194 (100%)
Decoder layers compressed	42 / 42 (100%)
Task	`automatic-speech-recognition-with-past`

Toolchain

Package	Version
Python	3.11.9
openvino	2026.0.0
openvino-genai	2026.0.0.0
openvino-tokenizers	2026.0.0.0
optimum-intel	1.27.0
optimum	2.1.0
nncf	3.0.0
transformers	4.57.6
torch	2.11.0

Usage

import numpy as np
import openvino_genai as ov_genai

pipe = ov_genai.WhisperPipeline("ov-whisper_large_v3_turbo-int8-2026.0.0", "CPU")

# Load audio as 16 kHz float32 mono (e.g. via librosa)
import librosa
samples, _ = librosa.load("audio.wav", sr=16000, mono=True)
samples = np.asarray(samples, dtype=np.float32)

result = pipe.generate(samples)
print(result.text)

Supported devices: CPU, GPU, NPU (tested on Intel Core Ultra 7 255H / Arc 140T / AI Boost).

Reproduce the export

pip install -r requirements.txt
python export_whisper_int8_ov.py \
    --model openai/whisper-large-v3-turbo \
    --output ov-whisper_large_v3_turbo-int8-2026.0.0 \
    --cache-dir ./cache_dir

Or equivalently with optimum-cli directly:

optimum-cli export openvino \
    -m openai/whisper-large-v3-turbo \
    --task automatic-speech-recognition-with-past \
    --weight-format int8 \
    --group-size 128 \
    ov-whisper_large_v3_turbo-int8-2026.0.0

Validate

python validate_whisper_genai.py ov-whisper_large_v3_turbo-int8-2026.0.0 --device CPU

Files

openvino_encoder_model.bin/.xml -- Whisper encoder (INT8)
openvino_decoder_model.bin/.xml -- Whisper decoder with past/beam_idx (INT8)
openvino_tokenizer.bin/.xml -- Tokenizer
openvino_detokenizer.bin/.xml -- Detokenizer
config.json, generation_config.json -- Model configuration
tokenizer.json, vocab.json, merges.txt -- Tokenizer data
export_whisper_int8_ov.py -- Export script used to produce this model
validate_whisper_genai.py -- Smoke-test script
requirements.txt -- Pinned Python dependencies

Downloads last month: 22

Model tree for hlevring/ov-whisper_large_v3_turbo-int8-2026.0.0

Base model

openai/whisper-large-v3

Finetuned

openai/whisper-large-v3-turbo

Finetuned

(533)

this model