SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph Paper • 2301.01949 • Published Jan 5, 2023
AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities Paper • 2211.06679 • Published Nov 12, 2022 • 2
AltDiffusion: A Multilingual Text-to-Image Diffusion Model Paper • 2308.09991 • Published Aug 19, 2023 • 3
InstructX: Towards Unified Visual Editing with MLLM Guidance Paper • 2510.08485 • Published Oct 9, 2025 • 18
DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer Paper • 2601.01425 • Published Jan 4 • 53
DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation Paper • 2602.12160 • Published Feb 12 • 38
Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published 3 days ago • 133
Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published 3 days ago • 133
Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models Paper • 2603.22212 • Published 25 days ago • 126
DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation Paper • 2602.12160 • Published Feb 12 • 38
OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer Paper • 2601.14250 • Published Jan 20 • 48 • 5
OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer Paper • 2601.14250 • Published Jan 20 • 48
OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer Paper • 2601.14250 • Published Jan 20 • 48
DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer Paper • 2601.01425 • Published Jan 4 • 53
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models Paper • 2509.17627 • Published Sep 22, 2025 • 66
UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward Paper • 2509.06818 • Published Sep 8, 2025 • 29
USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning Paper • 2508.18966 • Published Aug 26, 2025 • 56
Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset Paper • 2506.18851 • Published Jun 23, 2025 • 30
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors Paper • 2505.24625 • Published May 30, 2025 • 9