Qwen3-ASR-1.7B β€” Core AI

Qwen3-ASR-1.7B speech-to-text converted for Apple Core AI, running on-device (iPhone + Mac). The zoo's first ASR model: an AuT audio encoder feeding a Qwen3 decoder on the pipelined engine (audio embeds bound to one static input buffer; {lang}<asr_text>{text} output). ≀30 s clips, 52 languages, automatic language detection.

Use it

▢️ Run it (source) β€” the Transcribe runner (GUI + CLI, one app for every speech-to-text model in the catalog):

git clone https://github.com/john-rocky/coreai-kit
open coreai-kit/Examples/Transcribe/Transcribe.xcodeproj
# β†’ Run, then pick "Qwen3-ASR 1.7B" in the model picker

# agents / headless (macOS):
cd coreai-kit/Examples/Transcribe
swift run transcribe-cli --model qwen3-asr-1.7b --audio sample.wav

πŸ’» Build with it β€” complete; the glue is kit API, copy-paste runs:

import CoreAIKit

let transcriber = try await KitTranscriber(catalog: "qwen3-asr-1.7b")
let samples = try AudioFile.pcm16kMono(url)  // any wav/m4a/mp3 β†’ 16 kHz mono Float
let result = try await transcriber.transcribe(samples: samples)
// result.text, result.language (52 languages)

The take-home is Examples/Transcribe/Sources/QuickStart.swift β€” this exact code as one typed function, no UI; both the runner's GUI and its CLI call it. Recording? MicRecorder (kit API) captures mic audio as 16 kHz mono [Float] β€” the record button and permission prompt are your app's own chrome.

Integration checklist

  • SPM: https://github.com/john-rocky/coreai-kit β†’ product CoreAIKit
  • Info.plist: NSMicrophoneUsageDescription β€” only if you record
  • Entitlements: none needed (macOS)
  • First run downloads the model β€” 3.1 GB (Mac) β€” then it loads from the local cache (Application Support; progress via the downloadProgress callback)
  • Measure in Release β€” Debug is ~3Γ— slower on per-token host work

Driven by CoreAIKit KitASRModel:

let asr = try await KitASRModel(model: .qwen3ASR1_7B)
let r = try await asr.transcribe(samples: pcm16kMono)   // -> (language, text)

Layout: gpu-pipelined/ holds the decoder bundle (*_decode_int8hu_n390_s1, int8) + the paired AuT encoder (*_audio_encoder_fp16_k30, fp16). Same bundles on iOS and macOS.

App: coreai-audio (Transcribe tab β€” pick Qwen3-ASR or Whisper large-v3-turbo). Card: zoo/qwen3-asr.md.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support