Zhongwei Zhang

zzwustc

7 31 35

zzw-ustc

AI & ML interests

AIGC

Recent Activity

liked a model 4 days ago

Lightricks/LTX-2.3-22b-IC-LoRA-Clean-Plate

liked a dataset 23 days ago

markov-ai/gaming-500-hours

upvoted a paper 26 days ago

Vera: A Layered Diffusion Model for Content-Preserving Video Editing

View all activity

Organizations

upvoted a paper 26 days ago

Vera: A Layered Diffusion Model for Content-Preserving Video Editing

Paper • 2606.23610 • Published Jun 22 • 12

upvoted 8 papers about 2 months ago

Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models

Paper • 2606.03988 • Published Jun 3 • 126

MilliVid: Hierarchical Latents for Long-Range Consistency in Video Generation

Paper • 2606.09056 • Published Jun 8 • 6

minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models

Paper • 2605.30263 • Published May 28 • 59

CoVEBench: Can Video Editing Models Handle Complex Instructions?

Paper • 2606.08415 • Published Jun 7 • 52

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

Paper • 2605.15178 • Published May 14 • 91

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

Paper • 2605.13724 • Published May 13 • 105

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

Paper • 2605.18739 • Published May 18 • 116

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

Paper • 2605.12500 • Published May 12 • 194

upvoted a paper 4 months ago

Helios: Real Real-Time Long Video Generation Model

Paper • 2603.04379 • Published Mar 4 • 190

upvoted 2 papers 5 months ago

Mode Seeking meets Mean Seeking for Fast Long Video Generation

Paper • 2602.24289 • Published Feb 27 • 41

Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance

Paper • 2603.02175 • Published Mar 2 • 24

upvoted an article 5 months ago

Article

设计位置编码

FL33TW00D-HF

•

Nov 25, 2024

• 29

upvoted a paper 5 months ago

Code2World: A GUI World Model via Renderable Code Generation

Paper • 2602.09856 • Published Feb 10 • 201

upvoted a paper 6 months ago

Urban Socio-Semantic Segmentation with Vision-Language Reasoning

Paper • 2601.10477 • Published Jan 15 • 155

upvoted 2 papers 7 months ago

Region-Constraint In-Context Generation for Instructional Video Editing

Paper • 2512.17650 • Published Dec 19, 2025 • 53

SAM Audio: Segment Anything in Audio

Paper • 2512.18099 • Published Dec 19, 2025 • 25

upvoted a paper 9 months ago

FARMER: Flow AutoRegressive Transformer over Pixels

Paper • 2510.23588 • Published Oct 27, 2025 • 59

upvoted a paper 10 months ago

Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation

Paper • 2510.01284 • Published Sep 30, 2025 • 37

upvoted an article 12 months ago