Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts Paper • 2606.05922 • Published 25 days ago • 69
Efficient Agentic Reinforcement Learning with On-Policy Intrinsic Knowledge Boundary Enhancement Paper • 2605.26952 • Published May 26 • 16
OmniGUI: Benchmarking GUI Agents in Omni-Modal Smartphone Environments Paper • 2605.18758 • Published Apr 3 • 16
Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining Paper • 2605.14747 • Published May 14 • 147
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices Paper • 2605.10933 • Published May 11 • 4
SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents Paper • 2604.17308 • Published Apr 19 • 23
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 509
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 638
OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training Paper • 2603.28858 • Published Mar 30 • 9
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published Mar 30 • 344
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 352
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper • 2603.25040 • Published Mar 26 • 134
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding Paper • 2603.22458 • Published Mar 23 • 138
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding Paper • 2603.19235 • Published Mar 19 • 95
InCoder-32B: Code Foundation Model for Industrial Scenarios Paper • 2603.16790 • Published Mar 17 • 312
HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions Paper • 2603.15612 • Published Mar 16 • 153
Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning Paper • 2603.04597 • Published Mar 4 • 211
Does Your Reasoning Model Implicitly Know When to Stop Thinking? Paper • 2602.08354 • Published Feb 9 • 266