Instructions to use yslinear/kotoba-whisper-v2.2-coreml with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- WhisperKit
How to use yslinear/kotoba-whisper-v2.2-coreml with WhisperKit:
# Install CLI with Homebrew on macOS device brew install whisperkit-cli # View all available inference options whisperkit-cli transcribe --help # Download and run inference using whisper base model whisperkit-cli transcribe --audio-path /path/to/audio.mp3 # Or use your preferred model variant whisperkit-cli transcribe --model "large-v3" --model-prefix "distil" --audio-path /path/to/audio.mp3 --verbose
- Notebooks
- Google Colab
- Kaggle
kotoba-whisper-v2.2 CoreML
This is the CoreML conversion of kotoba-tech/kotoba-whisper-v2.2 for use with WhisperKit.
Model Details
- Base Model: kotoba-tech/kotoba-whisper-v2.2
- Language: Japanese (ja)
- Format: CoreML (.mlmodelc)
- Optimized for: Apple Silicon (ANE - Apple Neural Engine)
Included Files
| File | Description | ANE Support |
|---|---|---|
AudioEncoder.mlmodelc |
Audio feature encoder | 100% |
TextDecoder.mlmodelc |
Text decoder | 98% |
MelSpectrogram.mlmodelc |
Mel spectrogram converter | 72% |
Usage with WhisperKit
import WhisperKit
let whisperKit = try await WhisperKit(
modelFolder: "path/to/kotoba-tech_kotoba-whisper-v2.2"
)
let result = try await whisperKit.transcribe(
audioPath: "path/to/audio.wav",
language: "ja"
)
Notes
- This is a distilled model with only 2 decoder layers (vs 32 in the original Whisper large model)
- Token-level timestamps are disabled due to alignment heads configuration incompatibility with the distilled architecture
License
This model is released under the Apache License 2.0, following the original model's license.
Attribution
This is a derivative work based on:
- kotoba-tech/kotoba-whisper-v2.2 - The original Japanese Whisper model by Kotoba Technologies
- OpenAI Whisper - The base Whisper architecture
- Distil-Whisper - Distillation codebase
- ReazonSpeech - Japanese speech dataset
Acknowledgments
- kotoba-tech for the original model
- argmaxinc for WhisperKit and whisperkittools
- Downloads last month
- 431
Model tree for yslinear/kotoba-whisper-v2.2-coreml
Base model
kotoba-tech/kotoba-whisper-v2.2
# Install CLI with Homebrew on macOS device brew install whisperkit-cli # View all available inference options whisperkit-cli transcribe --help # Download and run inference using whisper base model whisperkit-cli transcribe --audio-path /path/to/audio.mp3 # Or use your preferred model variant whisperkit-cli transcribe --model "large-v3" --model-prefix "distil" --audio-path /path/to/audio.mp3 --verbose