π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows Paper • 2605.14678 • Published 5 days ago • 90
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling Paper • 2605.13301 • Published 11 days ago • 155
Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation Paper • 2509.21989 • Published Sep 26, 2025 • 23
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published Jul 17, 2025 • 79
Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration Paper • 2509.14760 • Published Sep 18, 2025 • 53
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models Paper • 2505.14810 • Published May 20, 2025 • 62
Group Think: Multiple Concurrent Reasoning Agents Collaborating at Token Level Granularity Paper • 2505.11107 • Published May 16, 2025 • 29
A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency Paper • 2505.01658 • Published May 3, 2025 • 40