AutoMIA: Improved Baselines for Membership Inference Attack via Agentic Self-Exploration Paper • 2604.01014 • Published 3 days ago • 5
TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation Paper • 2603.19039 • Published 16 days ago • 50
ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer Paper • 2603.15478 • Published 18 days ago • 24
Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models Paper • 2603.15557 • Published 18 days ago • 28
In-Video Instructions: Visual Signals as Generative Control Paper • 2511.19401 • Published Nov 24, 2025 • 32
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist Paper • 2511.08521 • Published Nov 11, 2025 • 39
Discrete Diffusion LLM & MLLM Collection An collection of research/models in discrete diffusion large language and multimodal models • 57 items • Updated Jun 17, 2025 • 4
Discrete Diffusion in Large Language and Multimodal Models: A Survey Paper • 2506.13759 • Published Jun 16, 2025 • 43
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding Paper • 2505.16990 • Published May 22, 2025 • 22
EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models Paper • 2506.01667 • Published Jun 2, 2025 • 21