Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments Paper • 2605.30280 • Published 22 days ago • 144
ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors Paper • 2603.04338 • Published Mar 4 • 24
InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields Paper • 2601.03252 • Published Jan 6 • 104
Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data Paper • 2507.07095 • Published Jul 9, 2025 • 56
MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning Paper • 2509.22281 • Published Sep 26, 2025 • 33
MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning Paper • 2509.22281 • Published Sep 26, 2025 • 33 • 3
The Invisible Leash: Why RLVR May Not Escape Its Origin Paper • 2507.14843 • Published Jul 20, 2025 • 85
Light of Normals: Unified Feature Representation for Universal Photometric Stereo Paper • 2506.18882 • Published Jun 23, 2025 • 89