Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos Paper • 2603.22529 • Published Mar 23 • 7
3DCodeBench: Benchmarking Agentic Procedural 3D Modeling Via Code Paper • 2606.01057 • Published 9 days ago • 7
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Paper • 2602.12670 • Published Feb 13 • 62