Towards Comprehensive Semantic Speech Embeddings for Chinese Dialects Paper • 2601.07274 • Published Jan 12 • 1
PWESuite: Phonetic Word Embeddings and Tasks They Facilitate Paper • 2304.02541 • Published Apr 5, 2023 • 2
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks Paper • 2411.05361 • Published Nov 8, 2024 • 5
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model Paper • 2510.24992 • Published Oct 28, 2025 • 4
Proactive Detection of Voice Cloning with Localized Watermarking Paper • 2401.17264 • Published Jan 30, 2024 • 19
Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9, 2024 • 44
From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion Paper • 2308.02560 • Published Aug 2, 2023 • 5