Controlled Self-Evolution for Algorithmic Code Optimization Paper • 2601.07348 • Published 14 days ago • 112
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation Paper • 2601.09688 • Published 12 days ago • 124
OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent Paper • 2601.07779 • Published 14 days ago • 27
MMLongCite: A Benchmark for Evaluating Fidelity of Long-Context Vision-Language Models Paper • 2510.13276 • Published Oct 15, 2025
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 213
MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark Paper • 2512.01603 • Published Dec 1, 2025
Agent2World: Learning to Generate Symbolic World Models via Adaptive Multi-Agent Feedback Paper • 2512.22336 • Published about 1 month ago • 2
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning Paper • 2601.06002 • Published 17 days ago • 50
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning Paper • 2601.06002 • Published 17 days ago • 50
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning Paper • 2601.06002 • Published 17 days ago • 50
Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans? Paper • 2512.13281 • Published Dec 15, 2025 • 64
ARISE: An Adaptive Resolution-Aware Metric for Test-Time Scaling Evaluation in Large Reasoning Models Paper • 2510.06014 • Published Oct 7, 2025 • 10