UniDDT: Unifying Multimodal Understanding and Generation with Decoupled Diffusion Transformer Paper • 2606.16255 • Published 6 days ago • 13
HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers Paper • 2606.13289 • Published 10 days ago • 28
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision Paper • 2512.01342 • Published Dec 1, 2025 • 21
UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions Paper • 2511.03334 • Published Nov 5, 2025 • 54 • 6
SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation Paper • 2511.19320 • Published Nov 24, 2025 • 43