Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization Paper • 2602.22675 • Published 20 days ago • 22
\$OneMillion-Bench: How Far are Language Agents from Human Experts? Paper • 2603.07980 • Published 9 days ago • 26
Understanding by Reconstruction: Reversing the Software Development Process for LLM Pretraining Paper • 2603.11103 • Published 7 days ago • 8
Visual-Aware CoT: Achieving High-Fidelity Visual Consistency in Unified Models Paper • 2512.19686 • Published Dec 22, 2025
Context Forcing: Consistent Autoregressive Video Generation with Long Context Paper • 2602.06028 • Published Feb 5 • 36
Context Forcing: Consistent Autoregressive Video Generation with Long Context Paper • 2602.06028 • Published Feb 5 • 36
Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities Paper • 2601.21937 • Published Jan 29 • 19
ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation Paper • 2601.21420 • Published Jan 29 • 42
NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents Paper • 2512.12730 • Published Dec 14, 2025 • 50
AutoMV: An Automatic Multi-Agent System for Music Video Generation Paper • 2512.12196 • Published Dec 13, 2025 • 6
Encyclo-K: Evaluating LLMs with Dynamically Composed Knowledge Statements Paper • 2512.24867 • Published Dec 31, 2025 • 1
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space Paper • 2512.24617 • Published Dec 31, 2025 • 65
AInsteinBench: Benchmarking Coding Agents on Scientific Repositories Paper • 2512.21373 • Published Dec 24, 2025
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning Paper • 2601.06002 • Published Jan 9 • 57
Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models Paper • 2512.13607 • Published Dec 15, 2025 • 37
NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents Paper • 2512.12730 • Published Dec 14, 2025 • 50
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence Paper • 2511.18538 • Published Nov 23, 2025 • 302
MME-CC: A Challenging Multi-Modal Evaluation Benchmark of Cognitive Capacity Paper • 2511.03146 • Published Nov 5, 2025 • 8