Learn2Fold: Structured Origami Generation with World Model Planning Paper • 2603.29585 • Published Feb 2 • 15
GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation Paper • 2603.26661 • Published 9 days ago • 19
MMFace-DiT: A Dual-Stream Diffusion Transformer for High-Fidelity Multimodal Face Generation Paper • 2603.29029 • Published 6 days ago • 13
EgoSim: Egocentric World Simulator for Embodied Interaction Generation Paper • 2604.01001 • Published 5 days ago • 34
Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis Paper • 2603.29620 • Published 5 days ago • 45
Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification Paper • 2603.26648 • Published 9 days ago • 40
ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners? Paper • 2603.25823 • Published 10 days ago • 42
CutClaw: Agentic Hours-Long Video Editing via Music Synchronization Paper • 2603.29664 • Published 5 days ago • 44
VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward Paper • 2603.26599 • Published 9 days ago • 58