Xinlong Chen's picture

Xinlong Chen

XinlongChen

·

https://xlchen0205.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a paper about 1 month ago

TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions

upvoted a paper about 1 month ago

OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

upvoted a paper about 1 month ago

Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

View all activity

Organizations

None yet

upvoted 5 papers about 1 month ago

TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions

Paper • 2602.08711 • Published Feb 9 • 28

OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

Paper • 2602.04804 • Published Feb 4 • 46

Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

Paper • 2602.03510 • Published Feb 3 • 27

3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation

Paper • 2602.03796 • Published Feb 3 • 62

Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks

Paper • 2602.01630 • Published Feb 2 • 47

upvoted a paper about 2 months ago

CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation

Paper • 2601.10061 • Published Jan 15 • 31

upvoted 2 papers 2 months ago

GARDO: Reinforcing Diffusion Models without Reward Hacking

Paper • 2512.24138 • Published Dec 30, 2025 • 30

GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models

Paper • 2512.15560 • Published Dec 17, 2025 • 25

upvoted 4 papers 3 months ago

SemanticGen: Video Generation in Semantic Space

Paper • 2512.20619 • Published Dec 23, 2025 • 93

Kling-Omni Technical Report

Paper • 2512.16776 • Published Dec 18, 2025 • 172

VABench: A Comprehensive Benchmark for Audio-Video Generation

Paper • 2512.09299 • Published Dec 10, 2025 • 8

Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling

Paper • 2512.12675 • Published Dec 14, 2025 • 41

upvoted a paper 5 months ago

MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning

Paper • 2510.14265 • Published Oct 16, 2025 • 20

authored 6 papers 5 months ago

Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models

Paper • 2501.09997 • Published Jan 17, 2025

Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published Apr 14, 2025 • 30

MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios

Paper • 2505.21333 • Published May 27, 2025 • 38

VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation

Paper • 2502.12782 • Published Feb 18, 2025

RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark

Paper • 2509.24897 • Published Sep 29, 2025 • 46

AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration

Paper • 2510.10395 • Published Oct 12, 2025 • 31

upvoted a paper 5 months ago

AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration

Paper • 2510.10395 • Published Oct 12, 2025 • 31