Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing Paper • 2603.12254 • Published 14 days ago • 20
Tinted Frames: Question Framing Blinds Vision-Language Models Paper • 2603.19203 • Published 7 days ago • 16
Reconstruction Alignment Improves Unified Multimodal Models Paper • 2509.07295 • Published Sep 8, 2025 • 40