Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents Paper • 2510.23691 • Published Oct 27, 2025 • 57
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models Paper • 2511.09515 • Published Nov 12, 2025 • 21
Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising Paper • 2511.08633 • Published Nov 9, 2025 • 58
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published Nov 12, 2025 • 218
EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery Paper • 2606.13662 • Published 7 days ago • 27
World Model Self-Distillation: Training World Models to Solve General Tasks Paper • 2606.12072 • Published 8 days ago • 13
World Pilot: Steering Vision-Language-Action Models with World-Action Priors Paper • 2606.12403 • Published 8 days ago • 25
VideoMDM: Towards 3D Human Motion Generation From 2D Supervision Paper • 2606.13364 • Published 7 days ago • 20
OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics Paper • 2606.09826 • Published 10 days ago • 19
SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks Paper • 2606.09669 • Published 10 days ago • 43
EarlyTom: Early Token Compression Completes Fast Video Understanding Paper • 2605.30010 • Published 21 days ago • 32
PEAM: Parametric Embodied Agent Memory through Contrastive Internalization of Experience in Minecraft Paper • 2605.27762 • Published 23 days ago • 7
TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction Paper • 2605.26115 • Published 24 days ago • 52
Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models Paper • 2605.26895 • Published 23 days ago • 20
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published 23 days ago • 141