Instructions to use fredchu/breeze-asr-26-mlx-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use fredchu/breeze-asr-26-mlx-4bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir breeze-asr-26-mlx-4bit fredchu/breeze-asr-26-mlx-4bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Breeze-ASR-26 โ MLX (4-bit palette quantized)
Apple MLX port of MediaTek-Research/Breeze-ASR-26, the Taiwanese Hokkien (Taigi) ASR model from MediaTek Research's MR Breeze 3 series, 4-bit palette quantized (group size 64) for compact on-device inference.
Runs on Apple Silicon Macs (M1/M2/M3/M4) via the mlx-whisper package โ same API as mlx-community/whisper-* checkpoints.
Files
| file | size |
|---|---|
weights.safetensors |
~877 MB |
config.json |
<1 KB |
72% smaller than the fp16 variant (3.1 GB).
Usage
import mlx_whisper
result = mlx_whisper.transcribe(
"audio.wav",
path_or_hf_repo="fredchu/breeze-asr-26-mlx-4bit",
language="zh",
)
print(result["text"])
CLI:
mlx_whisper audio.wav --model fredchu/breeze-asr-26-mlx-4bit --language zh
Performance (M1 Max, mlx-whisper)
| sample | duration | RTF | vs. fp16 |
|---|---|---|---|
| Mandarin financial speech (60s) | 60.0s | 12.57ร real-time | +44% faster |
| Taiwanese Hokkien sample (25s) | 25.0s | 11.71ร real-time | +86% faster |
Surprising Quality Result
In our 2026-04-30 head-to-head against the fp16 variant on a real Taigi sample (a Mandarin-speaking creator using the Taigi word ๆผๆณ pio-pรดa), the 4-bit variant transcribed ๆผๆณ correctly while fp16 produced ็็ instead. Both runs identical otherwise (same audio, same mlx-whisper version, language=zh).
This is counterintuitive โ quantization usually degrades quality. One possible explanation: 4-bit palette quantization (mapping weights to 16 representative values per group) may re-calibrate outlier weights in a way that better generalizes to underrepresented Taigi tokens. Reproducible on M1 Max; worth verifying on your own samples.
For Taigi-heavy use cases, try this 4-bit variant first. For pure Mandarin or read-speech benchmarks, the fp16 variant remains the safer default.
Limitations (inherited from base model)
- Outputs Mandarin Chinese characters, not Taigi orthography (ๅฐ่ชๆญฃๅญ / ๅฐ็พ )
- Trained on ~10,000 hours of synthetic Taigi speech โ distribution gap with real spontaneous speech
- English brand/proper nouns are aggressively transliterated: in our Mandarin test,
Hellobecameๅๅ,AustinbecameAlstin,NetflixbecameNathalie ็ๆไบ. ASR-25 (MediaTek-Research/Breeze-ASR-25) handles these correctly. Do not use this model for content with frequent English code-switching. - All segments come back as one ~30-second block regardless of audio content (model training behaviour, not framework setting). Post-process if you need finer subtitle granularity.
Quantization Details
| parameter | value |
|---|---|
| method | palette (lookup-table) |
| bits | 4 |
| group size | 64 |
Performed via mlx.nn.quantize after weight conversion from HuggingFace transformers safetensors โ MLX Whisper format.
Conversion
Built with a custom wrapper around mlx-examples/whisper/convert.py that adds sharded-safetensors loader support (the source repo ships weights as 5 GB + 1 GB shards, which the upstream converter doesn't handle).
License
Apache 2.0 โ inherits from the base model.
Acknowledgments
- MediaTek Research for the original Breeze-ASR-26 weights
- Apple MLX team for the framework, quantization API, and conversion tooling
- Downloads last month
- 43
Quantized
Model tree for fredchu/breeze-asr-26-mlx-4bit
Base model
openai/whisper-large-v2