AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios Paper • 2602.23166 • Published 22 days ago • 44
LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth Paper • 2602.07962 • Published Feb 8 • 24