Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents Paper • 2604.06132 • Published 2 days ago • 91
GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation Paper • 2512.17495 • Published Dec 19, 2025 • 20