BagelVLA: Enhancing Long-Horizon Manipulation via Interleaved Vision-Language-Action Generation Paper • 2602.09849 • Published 4 days ago • 15
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding Paper • 2511.13026 • Published Nov 17, 2025 • 26
HyperClick: Advancing Reliable GUI Grounding via Uncertainty Calibration Paper • 2510.27266 • Published Oct 31, 2025 • 21
Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle Paper • 2508.05612 • Published Aug 7, 2025 • 2