Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos Paper • 2603.22529 • Published Mar 23 • 7
3DCodeBench: Benchmarking Agentic Procedural 3D Modeling Via Code Paper • 2606.01057 • Published 9 days ago • 7
3DCodeBench: Benchmarking Agentic Procedural 3D Modeling Via Code Paper • 2606.01057 • Published 9 days ago • 7
ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive Sparse Anchor Generation Paper • 2308.09242 • Published Aug 18, 2023
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Paper • 2602.12670 • Published Feb 13 • 62
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Paper • 2602.12670 • Published Feb 13 • 62