Breeze-ASR-26 — MLX (4-bit palette quantized)

Apple MLX port of MediaTek-Research/Breeze-ASR-26, the Taiwanese Hokkien (Taigi) ASR model from MediaTek Research's MR Breeze 3 series, 4-bit palette quantized (group size 64) for compact on-device inference.

Runs on Apple Silicon Macs (M1/M2/M3/M4) via the mlx-whisper package — same API as mlx-community/whisper-* checkpoints.

Files

file	size
`weights.safetensors`	~877 MB
`config.json`	<1 KB

72% smaller than the fp16 variant (3.1 GB).

Usage

import mlx_whisper

result = mlx_whisper.transcribe(
    "audio.wav",
    path_or_hf_repo="fredchu/breeze-asr-26-mlx-4bit",
    language="zh",
)
print(result["text"])

CLI:

mlx_whisper audio.wav --model fredchu/breeze-asr-26-mlx-4bit --language zh

Performance (M1 Max, mlx-whisper)

sample	duration	RTF	vs. fp16
Mandarin financial speech (60s)	60.0s	12.57× real-time	+44% faster
Taiwanese Hokkien sample (25s)	25.0s	11.71× real-time	+86% faster

Surprising Quality Result

In our 2026-04-30 head-to-head against the fp16 variant on a real Taigi sample (a Mandarin-speaking creator using the Taigi word 漂泊 pio-pôa), the 4-bit variant transcribed 漂泊 correctly while fp16 produced 瀟灑 instead. Both runs identical otherwise (same audio, same mlx-whisper version, language=zh).

This is counterintuitive — quantization usually degrades quality. One possible explanation: 4-bit palette quantization (mapping weights to 16 representative values per group) may re-calibrate outlier weights in a way that better generalizes to underrepresented Taigi tokens. Reproducible on M1 Max; worth verifying on your own samples.

For Taigi-heavy use cases, try this 4-bit variant first. For pure Mandarin or read-speech benchmarks, the fp16 variant remains the safer default.

Limitations (inherited from base model)

Outputs Mandarin Chinese characters, not Taigi orthography (台語正字 / 台羅)
Trained on ~10,000 hours of synthetic Taigi speech — distribution gap with real spontaneous speech
English brand/proper nouns are aggressively transliterated: in our Mandarin test, Hello became 哈囉, Austin became Alstin, Netflix became Nathalie 的時事. ASR-25 (MediaTek-Research/Breeze-ASR-25) handles these correctly. Do not use this model for content with frequent English code-switching.
All segments come back as one ~30-second block regardless of audio content (model training behaviour, not framework setting). Post-process if you need finer subtitle granularity.

Quantization Details

parameter	value
method	palette (lookup-table)
bits	4
group size	64

Performed via mlx.nn.quantize after weight conversion from HuggingFace transformers safetensors → MLX Whisper format.

Conversion

Built with a custom wrapper around mlx-examples/whisper/convert.py that adds sharded-safetensors loader support (the source repo ships weights as 5 GB + 1 GB shards, which the upstream converter doesn't handle).

License

Apache 2.0 — inherits from the base model.

Acknowledgments

MediaTek Research for the original Breeze-ASR-26 weights
Apple MLX team for the framework, quantization API, and conversion tooling

Downloads last month: 43

MLX

Hardware compatibility

Quantized

Model tree for fredchu/breeze-asr-26-mlx-4bit

Base model

openai/whisper-large-v2

Finetuned

MediaTek-Research/Breeze-ASR-26

Finetuned

(9)

this model