Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders Paper • 2603.06569 • Published 10 days ago • 105
The Trinity of Consistency as a Defining Principle for General World Models Paper • 2602.23152 • Published 19 days ago • 197
Running on Zero Featured 17 Qwen3 VL Video Grounding 🥠 17 Text-guided object tracking, point tracking, reasoning.
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper • 2602.13515 • Published Feb 13 • 43
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation Paper • 2602.03796 • Published Feb 3 • 62
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models Paper • 2512.24165 • Published Dec 30, 2025 • 51
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times Paper • 2512.16093 • Published Dec 18, 2025 • 95
VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models Paper • 2509.19803 • Published Sep 24, 2025 • 120
Energy-Based Transformers are Scalable Learners and Thinkers Paper • 2507.02092 • Published Jul 2, 2025 • 69
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13, 2025 • 197