Breeze-ASR-26 โ€” MLX (4-bit palette quantized)

Apple MLX port of MediaTek-Research/Breeze-ASR-26, the Taiwanese Hokkien (Taigi) ASR model from MediaTek Research's MR Breeze 3 series, 4-bit palette quantized (group size 64) for compact on-device inference.

Runs on Apple Silicon Macs (M1/M2/M3/M4) via the mlx-whisper package โ€” same API as mlx-community/whisper-* checkpoints.

Files

file size
weights.safetensors ~877 MB
config.json <1 KB

72% smaller than the fp16 variant (3.1 GB).

Usage

import mlx_whisper

result = mlx_whisper.transcribe(
    "audio.wav",
    path_or_hf_repo="fredchu/breeze-asr-26-mlx-4bit",
    language="zh",
)
print(result["text"])

CLI:

mlx_whisper audio.wav --model fredchu/breeze-asr-26-mlx-4bit --language zh

Performance (M1 Max, mlx-whisper)

sample duration RTF vs. fp16
Mandarin financial speech (60s) 60.0s 12.57ร— real-time +44% faster
Taiwanese Hokkien sample (25s) 25.0s 11.71ร— real-time +86% faster

Surprising Quality Result

In our 2026-04-30 head-to-head against the fp16 variant on a real Taigi sample (a Mandarin-speaking creator using the Taigi word ๆผ‚ๆณŠ pio-pรดa), the 4-bit variant transcribed ๆผ‚ๆณŠ correctly while fp16 produced ็€Ÿ็‘ instead. Both runs identical otherwise (same audio, same mlx-whisper version, language=zh).

This is counterintuitive โ€” quantization usually degrades quality. One possible explanation: 4-bit palette quantization (mapping weights to 16 representative values per group) may re-calibrate outlier weights in a way that better generalizes to underrepresented Taigi tokens. Reproducible on M1 Max; worth verifying on your own samples.

For Taigi-heavy use cases, try this 4-bit variant first. For pure Mandarin or read-speech benchmarks, the fp16 variant remains the safer default.

Limitations (inherited from base model)

  • Outputs Mandarin Chinese characters, not Taigi orthography (ๅฐ่ชžๆญฃๅญ— / ๅฐ็พ…)
  • Trained on ~10,000 hours of synthetic Taigi speech โ€” distribution gap with real spontaneous speech
  • English brand/proper nouns are aggressively transliterated: in our Mandarin test, Hello became ๅ“ˆๅ›‰, Austin became Alstin, Netflix became Nathalie ็š„ๆ™‚ไบ‹. ASR-25 (MediaTek-Research/Breeze-ASR-25) handles these correctly. Do not use this model for content with frequent English code-switching.
  • All segments come back as one ~30-second block regardless of audio content (model training behaviour, not framework setting). Post-process if you need finer subtitle granularity.

Quantization Details

parameter value
method palette (lookup-table)
bits 4
group size 64

Performed via mlx.nn.quantize after weight conversion from HuggingFace transformers safetensors โ†’ MLX Whisper format.

Conversion

Built with a custom wrapper around mlx-examples/whisper/convert.py that adds sharded-safetensors loader support (the source repo ships weights as 5 GB + 1 GB shards, which the upstream converter doesn't handle).

License

Apache 2.0 โ€” inherits from the base model.

Acknowledgments

Downloads last month
43
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for fredchu/breeze-asr-26-mlx-4bit

Finetuned
(9)
this model