Instructions to use vitaldb/whisper-v3-turbo-kor-or-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use vitaldb/whisper-v3-turbo-kor-or-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForSeq2SeqLM base_model = AutoModelForSeq2SeqLM.from_pretrained("openai/whisper-large-v3-turbo") model = PeftModel.from_pretrained(base_model, "vitaldb/whisper-v3-turbo-kor-or-lora") - Notebooks
- Google Colab
- Kaggle
Whisper-large-v3-turbo Korean OR Speech LoRA
LoRA adapter for openai/whisper-large-v3-turbo fine-tuned on Synthetic K-OR Speech Audio v1, a Korean operating-room speech corpus.
Results
Test split: 800 clips (200 utterances ร 4 voices, utterance-id stratified, seed 20260503).
| ASR | CER | vs Whisper baseline |
|---|---|---|
| Qwen3-ASR-1.7B + LoRA | 0.0334 | -59.8% |
| Whisper-v3-turbo + this adapter | 0.0431 | -48.0% |
| Whisper-v3-turbo + this adapter + Hotwords | 0.0483 | -41.8% |
| Whisper-v3-turbo + Hotwords (no LoRA) | 0.0790 | -4.7% |
| Whisper-v3-turbo (baseline) | 0.0829 | โ |
CER by code-switching style:
| cs_style | Baseline | This adapter |
|---|---|---|
none (pure Korean) |
0.041 | 0.028 |
phonetic_kr (์์ฐจ) |
0.076 | 0.031 |
mixed (KR+EN) |
0.132 | 0.074 |
english |
0.462 | 0.308 |
Usage
pip install transformers peft librosa
import torch, librosa
from peft import PeftModel
from transformers import WhisperForConditionalGeneration, WhisperProcessor
base = "openai/whisper-large-v3-turbo"
adapter = "vitaldb/whisper-v3-turbo-kor-or-lora"
processor = WhisperProcessor.from_pretrained(base, language="ko", task="transcribe")
model = WhisperForConditionalGeneration.from_pretrained(base, torch_dtype=torch.bfloat16)
model = PeftModel.from_pretrained(model, adapter).to("cuda").eval()
forced = processor.get_decoder_prompt_ids(language="ko", task="transcribe")
y, _ = librosa.load("clip.wav", sr=16000, mono=True)
feats = processor(y, sampling_rate=16000, return_tensors="pt").input_features.to("cuda", dtype=torch.bfloat16)
with torch.no_grad():
gen = model.generate(input_features=feats, max_new_tokens=128, forced_decoder_ids=forced)
text = processor.tokenizer.batch_decode(gen, skip_special_tokens=True)[0].strip()
print(text)
Note: hotwords (Whisper prompt_ids) on top of this adapter degrades CER (+12% relative). Do not combine.
Training
- Base model: openai/whisper-large-v3-turbo (MIT-licensed)
- Trainable params: 13.9M / 822.8M (1.69%)
- Target modules:
q_proj,k_proj,v_proj,out_proj,fc1,fc2 - LoRA rank / alpha / dropout: 16 / 32 / 0.05
- Optimizer: AdamW, learning rate 1e-4, warmup 100 steps
- Schedule: 3 epochs, batch 4 ร grad_accum 4 (effective 16), bf16, gradient checkpointing
- Hardware: NVIDIA RTX 4090 24 GB
- Train time: ~18 minutes
- Train data: 6,400 clips (1,600 utterances ร 4 voices) from Synthetic K-OR Speech Audio v1
Dataset
Trained on Synthetic K-OR Speech Audio v1 โ 8,000 audio clips (2,000 utterances ร 4 voice profiles) synthesized from Synthetic K-OR Speech Corpus v1.0 text via Qwen3-TTS.
Limitations
- Trained on synthetic TTS audio, not real OR recordings.
- Single-institution lexicon (SNUH conventions).
- Apply with caution to truly out-of-distribution audio.
License
Apache-2.0.
Citation
@misc{kor_or_whisper_lora_2026,
title = {Whisper-large-v3-turbo Korean OR Speech LoRA Adapter},
author = {VitalDB / Seoul National University Hospital, Department of Anesthesiology and Pain Medicine},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/vitaldb/whisper-v3-turbo-kor-or-lora}
}
Acknowledgement
This work was supported by the Korea ARPA-H Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health and Welfare, Republic of Korea (Grant No. 2460006561).
- Downloads last month
- 21