MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos Paper • 2603.14145 • Published Mar 14 • 14
MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data Paper • 2603.09206 • Published Mar 10 • 53
Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge Paper • 2505.07365 • Published May 12, 2025
MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence Paper • 2508.13992 • Published Aug 19, 2025 • 7
Introducing SSBD+ Dataset with a Convolutional Pipeline for detecting Self-Stimulatory Behaviours in Children using raw videos Paper • 2311.15072 • Published Nov 25, 2023
UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders Paper • 2601.17950 • Published Jan 25 • 4
Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following Paper • 2511.21662 • Published Nov 26, 2025 • 11
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence Paper • 2511.07384 • Published Nov 10, 2025 • 19
SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation Paper • 2510.06303 • Published Oct 7, 2025 • 15
SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation Paper • 2510.06303 • Published Oct 7, 2025 • 15
SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation Paper • 2510.06303 • Published Oct 7, 2025 • 15
Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models Paper • 2412.06748 • Published Dec 9, 2024 • 3
DynaGuard: A Dynamic Guardrail Model With User-Defined Policies Paper • 2509.02563 • Published Sep 2, 2025 • 21
Exploiting Sparsity for Long Context Inference: Million Token Contexts on Commodity GPUs Paper • 2502.06766 • Published Feb 10, 2025
Do Audio-Language Models Understand Linguistic Variations? Paper • 2410.16505 • Published Oct 21, 2024 • 1
Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge Paper • 2505.07365 • Published May 12, 2025
Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models Paper • 2507.08128 • Published Jul 10, 2025 • 14
MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence Paper • 2508.13992 • Published Aug 19, 2025 • 7
MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence Paper • 2508.13992 • Published Aug 19, 2025 • 7
Query Attribute Modeling: Improving search relevance with Semantic Search and Meta Data Filtering Paper • 2508.04683 • Published Aug 6, 2025