22 7

Haozhan Shen

SZhanZ

AI & ML interests

None yet

Recent Activity

upvoted an article about 14 hours ago

VLX-Seek: Improving VLM Fine-Grained Perception via Region Reference Instead of Coordinate Generation

upvoted an article 1 day ago

VLX-Flow: Continuous Video Understanding for Real-Time Multimodal Interaction

upvoted a paper 5 days ago

KaLM-Reranker-V1: Fast but Not Late Interaction for Compressed Document Reranking

View all activity

Organizations

upvoted an article about 14 hours ago

Article

VLX-Seek: Improving VLM Fine-Grained Perception via Region Reference Instead of Coordinate Generation

omlab

•

about 16 hours ago

• 8

upvoted an article 1 day ago

Article

VLX-Flow: Continuous Video Understanding for Real-Time Multimodal Interaction

omlab

•

1 day ago

• 9

upvoted a paper 5 days ago

KaLM-Reranker-V1: Fast but Not Late Interaction for Compressed Document Reranking

Paper • 2606.22807 • Published 6 days ago • 47

updated a dataset 17 days ago

SZhanZ/mmc4_jsonl

Updated 17 days ago • 91 • 1

upvoted a paper 26 days ago

Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models

Paper • 2605.28132 • Published May 27 • 25

upvoted 3 papers 3 months ago

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

Paper • 2603.25158 • Published Mar 26 • 56

LMEB: Long-horizon Memory Embedding Benchmark

Paper • 2603.12572 • Published Mar 13 • 74

MM-CondChain: A Programmatically Verified Benchmark for Visually Grounded Deep Compositional Reasoning

Paper • 2603.12266 • Published Mar 12 • 19

authored a paper 3 months ago

MM-CondChain: A Programmatically Verified Benchmark for Visually Grounded Deep Compositional Reasoning

Paper • 2603.12266 • Published Mar 12 • 19

liked a dataset 3 months ago

KaLM-Embedding/LMEB

Viewer • Updated May 7 • 2.15M • 3.5k • 29

upvoted a paper 5 months ago

SwimBird: Eliciting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs

Paper • 2602.06040 • Published Feb 5 • 10

published a dataset 5 months ago

SZhanZ/mmc4_jsonl

Updated 17 days ago • 91 • 1

upvoted 4 papers 12 months ago

OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

Paper • 2406.16620 • Published Jun 24, 2024 • 3

Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

Paper • 2403.06892 • Published Mar 11, 2024 • 2

How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection

Paper • 2308.13177 • Published Aug 25, 2023 • 1

RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing

Paper • 2306.11300 • Published Jun 20, 2023 • 2

upvoted a collection 12 months ago

Multimodal Research

Collection

10 items • Updated Apr 14, 2025 • 4

upvoted 3 papers 12 months ago

GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection

Paper • 2312.15043 • Published Dec 22, 2023 • 2

VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations

Paper • 2207.00221 • Published Jul 1, 2022 • 2

OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network

Paper • 2209.05946 • Published Sep 10, 2022 • 2

Haozhan Shen

AI & ML interests

Recent Activity

Organizations

SZhanZ's activity

VLX-Seek: Improving VLM Fine-Grained Perception via Region Reference Instead of Coordinate Generation

VLX-Flow: Continuous Video Understanding for Real-Time Multimodal Interaction