11 2

Huixin Zhang

ZhangHuixin

ZhangHuixin1103

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 months ago

SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation

upvoted a paper 3 months ago

PISCO: Precise Video Instance Insertion with Sparse Control

upvoted an article 4 months ago

Vision Language Models (Better, faster, stronger)

View all activity

Organizations

None yet

upvoted a paper 2 months ago

SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation

Paper • 2603.16864 • Published Mar 17 • 17

upvoted a paper 3 months ago

PISCO: Precise Video Instance Insertion with Sparse Control

Paper • 2602.08277 • Published Feb 9 • 13

upvoted an article 4 months ago

Article

Vision Language Models (Better, faster, stronger)

merve, sergiopaniego, ariG23498, pcuenq, andito

•

May 12, 2025

• 613

upvoted a paper 4 months ago

T2T-VICL: Unlocking the Boundaries of Cross-Task Visual In-Context Learning via Implicit Text-Driven VLMs

Paper • 2511.16107 • Published Nov 20, 2025 • 2

upvoted a paper 6 months ago

VIDEOP2R: Video Understanding from Perception to Reasoning

Paper • 2511.11113 • Published Nov 14, 2025 • 112

upvoted a paper 9 months ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 222

updated 2 models 9 months ago

ZhangHuixin/llama-3.1-8b-math

8B • Updated Aug 21, 2025

ZhangHuixin/llama-3-8b-math

8B • Updated Aug 21, 2025

published 2 models 9 months ago

ZhangHuixin/llama-3.1-8b-math

8B • Updated Aug 21, 2025

ZhangHuixin/llama-3-8b-math

8B • Updated Aug 21, 2025

liked a Space 10 months ago

vggt

🏆

472

VGGT (CVPR 2025)

upvoted a paper 10 months ago

MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding

Paper • 2507.12463 • Published Jul 16, 2025 • 27

upvoted a paper 11 months ago

4KAgent: Agentic Any Image to 4K Super-Resolution

Paper • 2507.07105 • Published Jul 9, 2025 • 107

upvoted 2 papers 12 months ago

MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning

Paper • 2505.24871 • Published May 30, 2025 • 23

DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models

Paper • 2505.24025 • Published May 29, 2025 • 27

liked a Space about 1 year ago

New Test

📚

Generate human-object interacting videos with anchor images

upvoted an article about 1 year ago

Article

Llama can now see and run on your device - welcome Llama 3.2

merve, philschmid, osanseviero, reach-vb, lewtun, ariG23498, pcuenq

•

Sep 25, 2024

• 191

Huixin Zhang

AI & ML interests

Recent Activity

Organizations

ZhangHuixin's activity

Vision Language Models (Better, faster, stronger)

vggt

New Test

Llama can now see and run on your device - welcome Llama 3.2