Qwen3-ASR-1.7B β Core AI
Qwen3-ASR-1.7B speech-to-text converted for Apple Core AI, running on-device (iPhone + Mac).
The zoo's first ASR model: an AuT audio encoder feeding a Qwen3 decoder on the pipelined engine
(audio embeds bound to one static input buffer; {lang}<asr_text>{text} output). β€30 s clips,
52 languages, automatic language detection.
Use it
βΆοΈ Run it (source) β the Transcribe runner (GUI + CLI, one app for every speech-to-text model in the catalog):
git clone https://github.com/john-rocky/coreai-kit
open coreai-kit/Examples/Transcribe/Transcribe.xcodeproj
# β Run, then pick "Qwen3-ASR 1.7B" in the model picker
# agents / headless (macOS):
cd coreai-kit/Examples/Transcribe
swift run transcribe-cli --model qwen3-asr-1.7b --audio sample.wav
π» Build with it β complete; the glue is kit API, copy-paste runs:
import CoreAIKit
let transcriber = try await KitTranscriber(catalog: "qwen3-asr-1.7b")
let samples = try AudioFile.pcm16kMono(url) // any wav/m4a/mp3 β 16 kHz mono Float
let result = try await transcriber.transcribe(samples: samples)
// result.text, result.language (52 languages)
The take-home is Examples/Transcribe/Sources/QuickStart.swift
β this exact code as one typed function, no UI; both the runner's GUI and its CLI call it.
Recording? MicRecorder (kit API) captures mic audio as 16 kHz mono [Float] β the record
button and permission prompt are your app's own chrome.
Integration checklist
- SPM:
https://github.com/john-rocky/coreai-kitβ product CoreAIKit - Info.plist:
NSMicrophoneUsageDescriptionβ only if you record - Entitlements: none needed (macOS)
- First run downloads the model β 3.1 GB (Mac) β then it loads from the
local cache (Application Support; progress via the
downloadProgresscallback) - Measure in Release β Debug is ~3Γ slower on per-token host work
Driven by CoreAIKit KitASRModel:
let asr = try await KitASRModel(model: .qwen3ASR1_7B)
let r = try await asr.transcribe(samples: pcm16kMono) // -> (language, text)
Layout: gpu-pipelined/ holds the decoder bundle (*_decode_int8hu_n390_s1, int8) + the paired
AuT encoder (*_audio_encoder_fp16_k30, fp16). Same bundles on iOS and macOS.
App: coreai-audio (Transcribe tab β pick Qwen3-ASR or Whisper large-v3-turbo). Card: zoo/qwen3-asr.md.