From Pixels to Words -- Towards Native One-Vision Models at Scale Paper • 2605.28820 • Published 7 days ago • 70
From Pixels to Words -- Towards Native One-Vision Models at Scale Paper • 2605.28820 • Published 7 days ago • 70 • 3
NEO1_5 Collection From Pixels to Words -- Towards Native One-Vision Models at Scale • 3 items • Updated 6 days ago • 6
From Pixels to Words -- Towards Native One-Vision Models at Scale Paper • 2605.28820 • Published 7 days ago • 70
From Pixels to Words -- Towards Native One-Vision Models at Scale Paper • 2605.28820 • Published 7 days ago • 70
NEO1_5 Collection From Pixels to Words -- Towards Native One-Vision Models at Scale • 3 items • Updated 6 days ago • 6
LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence Paper • 2605.25979 • Published 9 days ago • 27
SpatialBench: Is Your Spatial Foundation Model an All-Round Player? Paper • 2605.27367 • Published 8 days ago • 70
PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects Paper • 2605.21572 • Published 14 days ago • 52
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation Paper • 2601.22153 • Published Jan 29 • 75
VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text? Paper • 2602.04802 • Published Feb 4 • 2
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture Paper • 2605.12500 • Published 22 days ago • 191
SenseNova-U1 Collection SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-Unify Architecture • 9 items • Updated 6 days ago • 69