ov-whisper_large_v3_turbo-int8-2026.0.0

openai/whisper-large-v3-turbo exported to OpenVINO IR with INT8 asymmetric weight compression (group size 128).

The model layout targets openvino_genai.WhisperPipeline and includes stateful decoder (-with-past), tokenizer, and detokenizer.

Whisper large-v3-turbo is a distilled version of Whisper large-v3 that is 6x faster with minimal quality loss. It supports 99 languages.

Quantization details

Parameter Value
Source model openai/whisper-large-v3-turbo
Weight format INT8 asymmetric (per-channel)
Group size 128
Encoder layers compressed 194 / 194 (100%)
Decoder layers compressed 42 / 42 (100%)
Task automatic-speech-recognition-with-past

Toolchain

Package Version
Python 3.11.9
openvino 2026.0.0
openvino-genai 2026.0.0.0
openvino-tokenizers 2026.0.0.0
optimum-intel 1.27.0
optimum 2.1.0
nncf 3.0.0
transformers 4.57.6
torch 2.11.0

Usage

import numpy as np
import openvino_genai as ov_genai

pipe = ov_genai.WhisperPipeline("ov-whisper_large_v3_turbo-int8-2026.0.0", "CPU")

# Load audio as 16 kHz float32 mono (e.g. via librosa)
import librosa
samples, _ = librosa.load("audio.wav", sr=16000, mono=True)
samples = np.asarray(samples, dtype=np.float32)

result = pipe.generate(samples)
print(result.text)

Supported devices: CPU, GPU, NPU (tested on Intel Core Ultra 7 255H / Arc 140T / AI Boost).

Reproduce the export

pip install -r requirements.txt
python export_whisper_int8_ov.py \
    --model openai/whisper-large-v3-turbo \
    --output ov-whisper_large_v3_turbo-int8-2026.0.0 \
    --cache-dir ./cache_dir

Or equivalently with optimum-cli directly:

optimum-cli export openvino \
    -m openai/whisper-large-v3-turbo \
    --task automatic-speech-recognition-with-past \
    --weight-format int8 \
    --group-size 128 \
    ov-whisper_large_v3_turbo-int8-2026.0.0

Validate

python validate_whisper_genai.py ov-whisper_large_v3_turbo-int8-2026.0.0 --device CPU

Files

  • openvino_encoder_model.bin/.xml -- Whisper encoder (INT8)
  • openvino_decoder_model.bin/.xml -- Whisper decoder with past/beam_idx (INT8)
  • openvino_tokenizer.bin/.xml -- Tokenizer
  • openvino_detokenizer.bin/.xml -- Detokenizer
  • config.json, generation_config.json -- Model configuration
  • tokenizer.json, vocab.json, merges.txt -- Tokenizer data
  • export_whisper_int8_ov.py -- Export script used to produce this model
  • validate_whisper_genai.py -- Smoke-test script
  • requirements.txt -- Pinned Python dependencies
Downloads last month
22
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hlevring/ov-whisper_large_v3_turbo-int8-2026.0.0

Finetuned
(533)
this model