MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery Paper • 2606.06473 • Published 2 days ago • 4
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints Paper • 2606.05622 • Published 2 days ago • 34
TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration Paper • 2606.04743 • Published 3 days ago • 36
BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution Paper • 2606.01286 • Published 6 days ago • 5
AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks? Paper • 2606.05080 • Published 3 days ago • 27
K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts Paper • 2606.02404 • Published 5 days ago • 53
OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources Paper • 2605.29250 • Published 9 days ago • 76
AutoScientists: Self-Organizing Agent Teams for Long-Running Scientific Experimentation Paper • 2605.28655 • Published 10 days ago • 11
ESC-Skills: Discovering and Self-Evolving Skills for Emotional Support Conversations Paper • 2605.27908 • Published 10 days ago • 6
Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents Paper • 2605.28775 • Published 10 days ago • 38
ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence Paper • 2605.26340 • Published 12 days ago • 35
ResearchMath-14K: Scaling Research-Level Mathematics via Agents Paper • 2605.28003 • Published 10 days ago • 49
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published 10 days ago • 419
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning Paper • 2605.28774 • Published 10 days ago • 89
ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation Paper • 2605.28293 • Published 10 days ago • 87
CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents Paper • 2605.25624 • Published 12 days ago • 33
π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows Paper • 2605.14678 • Published 18 days ago • 104
Self-Improving CAD Generation Agents with Finite Element Analysis as Feedback Paper • 2605.17448 • Published 20 days ago • 19
SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models Paper • 2605.23345 • Published 15 days ago • 17