PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training Paper • 2503.06486 • Published Mar 9, 2025
WorldOlympiad: Can Your World Model Survive a Triathlon? Paper • 2606.11129 • Published 4 days ago • 30
MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism Paper • 2606.07512 • Published 8 days ago • 38
Eliciting Complex Spatial Reasoning in MLLMs through Wide-Baseline Matching Paper • 2606.03577 • Published 11 days ago • 16
Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration? Paper • 2606.01247 • Published 13 days ago • 30
TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction Paper • 2605.26115 • Published 19 days ago • 52
Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer Paper • 2510.06590 • Published Oct 8, 2025 • 78
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling Paper • 2509.12201 • Published Sep 15, 2025 • 107
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks Paper • 2502.17157 • Published Feb 24, 2025 • 52