Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing Paper • 2603.12254 • Published 14 days ago • 20
SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning Paper • 2603.22057 • Published 3 days ago • 41
Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model Paper • 2603.05438 • Published 21 days ago • 39