UniDDT: Unifying Multimodal Understanding and Generation with Decoupled Diffusion Transformer Paper • 2606.16255 • Published 6 days ago • 12
HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers Paper • 2606.13289 • Published 10 days ago • 28
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision Paper • 2512.01342 • Published Dec 1, 2025 • 21
SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation Paper • 2511.19320 • Published Nov 24, 2025 • 43
Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment Paper • 2511.22345 • Published Nov 27, 2025 • 13
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training Paper • 2203.12602 • Published Mar 23, 2022 • 4
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Paper • 2504.15271 • Published Apr 21, 2025 • 69
UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions Paper • 2511.03334 • Published Nov 5, 2025 • 54