Instructions to use shooding/taiwan-breeze-asr-26 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps
- Unsloth Studio new
How to use shooding/taiwan-breeze-asr-26 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for shooding/taiwan-breeze-asr-26 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for shooding/taiwan-breeze-asr-26 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for shooding/taiwan-breeze-asr-26 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="shooding/taiwan-breeze-asr-26", max_seq_length=2048, )
Model Card: shooding/taiwan-breeze-asr-26
Overview
A CTranslate2 float16 LoRA fine-tune of
MediaTek-Research/Breeze-ASR-26
(itself a Whisper-large-v2 derivative for Taiwanese Hokkien / Taigi), fine-tuned with Unsloth on
adi-gov-tw/Taiwan-Tongues-ASR-CE-dataset-zhtw plus optional user-provided Taigi recordings, then
converted to CTranslate2 for use with faster-whisper.
Companion Mandarin repo with the same training recipe and release format:
shooding/faster-whisper-large-v3-zh-TW.
Model Details
- Model Type: Encoder-decoder speech transformer (Whisper architecture), CTranslate2 format
- Language(s): Taiwanese Hokkien (Taigi,
nan) with Mandarin-character output; Mandarin code-switching retained - Developed by: shooding
- Fine-tuned from:
MediaTek-Research/Breeze-ASR-26 - License: Apache 2.0
- Repository: https://github.com/shooding/taiwan-finetune
Uses
Direct Use
Transcribing Taiwanese Hokkien audio into Traditional Chinese characters (æ¼¢å—, not Tâi-lô) via the
faster-whisper library — real-time or batch pipelines.
Downstream Use
Taigi voice assistants, Taigi subtitle generation, Taigi↔Mandarin code-switching transcription, low-latency on-device inference (via CT2 int8/int8_float16 quantization).
Out-of-Scope Use
- Tâi-lô romanization output (model emits Han characters only)
- Other Sinitic languages (Cantonese, Hakka, Min-dong)
- Languages outside Taigi / Mandarin / English-CS
Getting Started
GPU (Recommended)
from faster_whisper import WhisperModel
model = WhisperModel(
'shooding/taiwan-breeze-asr-26',
device='cuda',
compute_type='float16',
)
segments, info = model.transcribe('taigi_clip.wav', language='zh', task='transcribe')
for seg in segments:
print(f'[{seg.start:.2f}s → {seg.end:.2f}s] {seg.text}')
CPU (int8 quantization)
model = WhisperModel(
'shooding/taiwan-breeze-asr-26',
device='cpu',
compute_type='int8',
)
Training Details
Training Data
- Primary (regularizer):
adi-gov-tw/Taiwan-Tongues-ASR-CE-dataset-zhtw(Mandarin + English CS), streaming - Secondary (target): user-provided Taigi recordings via
audiofolder - Interleaved with
CUSTOM_PROBcontrolling Taigi exposure (default 0.0625 ≈ 10 epochs over 200 Taigi clips)
Build Pipeline
- Load
MediaTek-Research/Breeze-ASR-26viaunsloth.FastModel - Apply LoRA adapters (r=64, α=64, target: q_proj, v_proj)
- Set
generation_config: language=zh, task=transcribe - Fine-tune with
Seq2SeqTraineron the interleaved stream - Merge LoRA → full fp16 HF model
- Convert to CTranslate2 with float16 quantization (
ct2-transformers-converter)
LoRA Configuration
| Parameter | Value |
|---|---|
| r | 64 |
| lora_alpha | 64 |
| target_modules | q_proj, v_proj |
| lora_dropout | 0 |
| bias | none |
| task_type | None (required for Whisper) |
Training Hyperparameters
| Hyperparameter | Value |
|---|---|
| max_steps | 2000 |
| per_device_train_batch_size | 4 |
| gradient_accumulation_steps | 4 (effective batch = 16) |
| learning_rate | 1e-4 |
| warmup_steps | 100 |
| lr_scheduler_type | cosine |
| optimizer | adamw_8bit (Unsloth) |
| weight_decay | 0.001 |
| eval_steps / save_steps | 200 |
| best model metric | CER (lower is better) |
Hardware & Environment
- Hardware: Google Colab (T4 / A100 class)
- LoRA Efficiency: ~2% of parameters trained, 50%+ VRAM reduction vs. full fine-tuning
Evaluation
Testing Data
Held-out split of adi-gov-tw/Taiwan-Tongues-ASR-CE-dataset-zhtw (200 samples). This eval is a
Mandarin retention signal, not a Taigi quality signal — run separate inference on a Taigi
benchmark (e.g. Breeze Taigi test set) for target-language CER.
Metrics
CER (Character Error Rate): edit distance / reference length.
During training the notebook disables predict_with_generate and derives CER from teacher-forced
logits argmax; reported values are therefore inflated and should only be read as a monotonicity
signal alongside validation loss.
Technical Specifications
- Architecture: Whisper large-v2 encoder-decoder transformer (~1.5B params), Breeze-ASR-26 pre-adapted
- Format: CTranslate2, float16 (quantizable to int8 at load time)
- Software Stack: Unsloth, Transformers 4.56.2, TRL 0.22.2, PEFT, CTranslate2, faster-whisper
Citation
@misc{shooding2026taiwanbreezeasr26,
author = {shooding},
title = {taiwan-breeze-asr-26: CTranslate2 LoRA fine-tune of Breeze-ASR-26 for Taiwanese Hokkien},
year = {2026},
howpublished = {\url{https://huggingface.co/shooding/taiwan-breeze-asr-26}},
}
Acknowledgements
- MediaTek Research for Breeze-ASR-26
- adi-gov-tw for the Taiwan Tongues ASR CE dataset
- Unsloth for the LoRA/training stack
- Downloads last month
- 16