GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published 4 days ago • 9
ViVa: A Video-Generative Value Model for Robot Reinforcement Learning Paper • 2604.08168 • Published 3 days ago • 12
Small Vision-Language Models are Smart Compressors for Long Video Understanding Paper • 2604.08120 • Published 3 days ago • 13
FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On Paper • 2604.08526 • Published 3 days ago • 18
MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping Paper • 2604.08364 • Published 3 days ago • 89