A Systematic Study of Cross-Modal Typographic Attacks on Audio-Visual Reasoning Paper • 2604.03995 • Published 8 days ago • 4
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published 18 days ago • 96
DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers Paper • 2602.16968 • Published Feb 19 • 12
Some Modalities are More Equal Than Others: Decoding and Architecting Multimodal Integration in MLLMs Paper • 2511.22826 • Published Nov 28, 2025 • 8
Generative Action Tell-Tales: Assessing Human Motion in Synthesized Videos Paper • 2512.01803 • Published Dec 1, 2025 • 5