5 16 6

Ray Yang

rayruiyang

Yangr116

AI & ML interests

None yet

Recent Activity

upvoted a paper 16 days ago

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

upvoted a paper 21 days ago

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

upvoted a paper 21 days ago

Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context

View all activity

Organizations

None yet

upvoted a paper 16 days ago

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

Paper • 2605.18739 • Published 17 days ago • 112

upvoted 2 papers 21 days ago

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

Paper • 2605.13779 • Published 22 days ago • 219

Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context

Paper • 2605.13831 • Published 22 days ago • 87

upvoted a paper 22 days ago

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

Paper • 2605.12495 • Published 23 days ago • 35

liked a dataset about 1 month ago

jdopensource/JoyAI-Image-OpenSpatial

Viewer • Updated Apr 15 • 2.35M • 66.8k • 6

upvoted a paper about 2 months ago

TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

Paper • 2604.04921 • Published Apr 6 • 114

upvoted 3 papers 2 months ago

ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?

Paper • 2603.25823 • Published Mar 26 • 43

MolmoPoint: Better Pointing for VLMs with Grounding Tokens

Paper • 2603.28069 • Published Mar 30 • 9

ProAct: Agentic Lookahead in Interactive Environments

Paper • 2602.05327 • Published Feb 5 • 27

updated a dataset 3 months ago

rayruiyang/vst_500k

Viewer • Updated Mar 13 • 563k • 8.85k • 4

upvoted 2 papers 3 months ago

Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence

Paper • 2603.07660 • Published Mar 8 • 87

Utonia: Toward One Encoder for All Point Clouds

Paper • 2603.03283 • Published Mar 3 • 185

updated a collection 4 months ago

VST

Collection

A comprehensive framework designed to cultivate VLMs with human-like visuospatial abilities. • 6 items • Updated Feb 1 • 6

updated a dataset 4 months ago

rayruiyang/vst_3d_grounding_benchmark

Preview • Updated Feb 1 • 52

published a dataset 4 months ago

rayruiyang/vst_3d_grounding_benchmark

Preview • Updated Feb 1 • 52

published a dataset 5 months ago

rayruiyang/vst_500k

Viewer • Updated Mar 13 • 563k • 8.85k • 4

upvoted a paper 6 months ago

DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning

Paper • 2512.12799 • Published Dec 14, 2025 • 12

upvoted a paper 7 months ago

Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

Paper • 2511.08892 • Published Nov 12, 2025 • 216

updated 2 models 7 months ago

rayruiyang/VST-3B-SFT

Image-Text-to-Text • 4B • Updated Nov 11, 2025 • 118

rayruiyang/VST-3B-RL

Image-Text-to-Text • 4B • Updated Nov 11, 2025 • 208 • 3

Ray Yang

AI & ML interests

Recent Activity

Organizations

rayruiyang's activity