End-to-End Joint ASR and Speaker Role Diarization with Child-Adult Interactions Paper • 2601.17640 • Published 25 days ago • 5
daVinci-Dev: Agent-native Mid-training for Software Engineering Paper • 2601.18418 • Published 24 days ago • 124
Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis Paper • 2601.14417 • Published 30 days ago • 5
HeartMuLa: A Family of Open Sourced Music Foundation Models Paper • 2601.10547 • Published Jan 15 • 42
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision Paper • 2601.03193 • Published Jan 6 • 47
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published Jan 11 • 212
Rethinking Video Generation Model for the Embodied World Paper • 2601.15282 • Published 29 days ago • 43
Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe Paper • 2508.01691 • Published Aug 3, 2025 • 10
tiantiaf/whisper-large-v3-msp-podcast-emotion Audio Classification • 2B • Updated Aug 10, 2025 • 948 • 5