MLX Speech Models
Collection
Speech AI models for Apple Silicon via MLX. ASR, TTS, VAD, diarization, speaker embedding. • 53 items • Updated • 4
How to use aufklarer/Qwen3-ForcedAligner-0.6B-8bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Qwen3-ForcedAligner-0.6B-8bit aufklarer/Qwen3-ForcedAligner-0.6B-8bit
8-bit quantized version of Qwen/Qwen3-ForcedAligner-0.6B for Apple Silicon inference via MLX.
Predicts word-level timestamps for audio+text pairs in a single non-autoregressive forward pass.
| Detail | Value |
|---|---|
| Audio encoder | 24 layers, 1024 dim, 16 heads, float16 |
| Text decoder | 28 layers, 1024 hidden, 16Q/8KV heads, 8-bit quantized (group_size=64) |
| Classify head | Linear(1024, 5000), float16 |
| Timestamp resolution | 80ms per class (5000 classes = 400s max) |
| Total size | ~1.4 GB |
let aligner = try await Qwen3ForcedAligner.fromPretrained(
modelId: "aufklarer/Qwen3-ForcedAligner-0.6B-8bit"
)
let aligned = aligner.align(
audio: samples, text: "Hello world", sampleRate: 24000
)
| Variant | Size | Model ID |
|---|---|---|
| 4-bit | ~979 MB | aufklarer/Qwen3-ForcedAligner-0.6B-4bit |
| 8-bit | ~1.4 GB | aufklarer/Qwen3-ForcedAligner-0.6B-8bit |
| bf16 | ~1.8 GB | aufklarer/Qwen3-ForcedAligner-0.6B-bf16 |
| CoreML INT4 | ~630 MB | aufklarer/Qwen3-ForcedAligner-0.6B-CoreML-INT4 |
| CoreML INT8 | ~1.0 GB | aufklarer/Qwen3-ForcedAligner-0.6B-CoreML-INT8 |
Quantized
Base model
Qwen/Qwen3-ForcedAligner-0.6B