EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments Paper • 2606.13681 • Published 2 days ago • 102
PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World Paper • 2605.05163 • Published May 6 • 37
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models Paper • 2502.09696 • Published Feb 13, 2025 • 43
Beyond Outcomes: Transparent Assessment of LLM Reasoning in Games Paper • 2412.13602 • Published Dec 18, 2024 • 4