Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2602.20159

Video understanding

Wolf: Captioning Everything with a World Summarization Framework

Paper • 2407.18908 • Published Jul 26, 2024 • 32
Mixture of Nested Experts: Adaptive Processing of Visual Tokens

Paper • 2407.19985 • Published Jul 29, 2024 • 37
TPDiff: Temporal Pyramid Video Diffusion Model

Paper • 2503.09566 • Published Mar 12, 2025 • 45
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

Paper • 2506.07464 • Published Jun 9, 2025 • 14

VBVR: A Very Big Video Reasoning Suite

Video-Reason/VBVR-Wan2.2

Image-to-Video • Updated 7 days ago • 212 • 127
Video-Reason/VBVR-Dataset

Viewer • Updated 21 days ago • 1M • 1.75k • 53
Video-Reason/VBVR-Bench-Data

Viewer • Updated 21 days ago • 500 • 794 • 9
Video-Reason/video-mcp

Viewer • Updated 21 days ago • 3.91k • 250 • 1

Qwen3-TTS Technical Report

Paper • 2601.15621 • Published Jan 22 • 75
PaperBanana: Automating Academic Illustration for AI Scientists

Paper • 2601.23265 • Published Jan 30 • 225
Moonshine: Speech Recognition for Live Transcription and Voice Commands

Paper • 2410.15608 • Published Oct 21, 2024 • 12
PersonaLive! Expressive Portrait Image Animation for Live Streaming

Paper • 2512.11253 • Published Dec 12, 2025 • 40

Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device

Paper • 2602.20161 • Published Feb 23 • 23
A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published Feb 23 • 519
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

Paper • 2603.21986 • Published about 1 month ago • 123
AURA: Always-On Understanding and Real-Time Assistance via Video Streams

Paper • 2604.04184 • Published 18 days ago • 50

Endless Terminals: Scaling RL Environments for Terminal Agents

Paper • 2601.16443 • Published Jan 23 • 18
Linear representations in language models can change dramatically over a conversation

Paper • 2601.20834 • Published Jan 28 • 21
Scaling Embeddings Outperforms Scaling Experts in Language Models

Paper • 2601.21204 • Published Jan 29 • 102
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

Paper • 2601.18778 • Published Jan 26 • 42

GLM-5: from Vibe Coding to Agentic Engineering

Paper • 2602.15763 • Published Feb 17 • 145
Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning

Paper • 2602.07845 • Published Feb 8 • 71
LLaDA2.1: Speeding Up Text Diffusion via Token Editing

Paper • 2602.08676 • Published Feb 9 • 71
MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

Paper • 2602.02474 • Published Feb 2 • 63

video-understanding

A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published Feb 23 • 519

A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published Feb 23 • 519
Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Paper • 2602.08354 • Published Feb 9 • 264

Interesting Papers

Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control

Paper • 2602.18422 • Published Feb 20 • 30
A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published Feb 23 • 519

LongCat-Flash-Thinking-2601 Technical Report

Paper • 2601.16725 • Published Jan 23 • 180
QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining

Paper • 2602.07085 • Published Feb 6 • 190
A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published Feb 23 • 519
AI Can Learn Scientific Taste

Paper • 2603.14473 • Published Mar 15 • 424

Video understanding

Wolf: Captioning Everything with a World Summarization Framework

Paper • 2407.18908 • Published Jul 26, 2024 • 32
Mixture of Nested Experts: Adaptive Processing of Visual Tokens

Paper • 2407.19985 • Published Jul 29, 2024 • 37
TPDiff: Temporal Pyramid Video Diffusion Model

Paper • 2503.09566 • Published Mar 12, 2025 • 45
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

Paper • 2506.07464 • Published Jun 9, 2025 • 14

GLM-5: from Vibe Coding to Agentic Engineering

Paper • 2602.15763 • Published Feb 17 • 145
Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning

Paper • 2602.07845 • Published Feb 8 • 71
LLaDA2.1: Speeding Up Text Diffusion via Token Editing

Paper • 2602.08676 • Published Feb 9 • 71
MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

Paper • 2602.02474 • Published Feb 2 • 63

VBVR: A Very Big Video Reasoning Suite

Video-Reason/VBVR-Wan2.2

Image-to-Video • Updated 7 days ago • 212 • 127
Video-Reason/VBVR-Dataset

Viewer • Updated 21 days ago • 1M • 1.75k • 53
Video-Reason/VBVR-Bench-Data

Viewer • Updated 21 days ago • 500 • 794 • 9
Video-Reason/video-mcp

Viewer • Updated 21 days ago • 3.91k • 250 • 1

video-understanding

A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published Feb 23 • 519

Qwen3-TTS Technical Report

Paper • 2601.15621 • Published Jan 22 • 75
PaperBanana: Automating Academic Illustration for AI Scientists

Paper • 2601.23265 • Published Jan 30 • 225
Moonshine: Speech Recognition for Live Transcription and Voice Commands

Paper • 2410.15608 • Published Oct 21, 2024 • 12
PersonaLive! Expressive Portrait Image Animation for Live Streaming

Paper • 2512.11253 • Published Dec 12, 2025 • 40

A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published Feb 23 • 519
Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Paper • 2602.08354 • Published Feb 9 • 264

Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device

Paper • 2602.20161 • Published Feb 23 • 23
A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published Feb 23 • 519
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

Paper • 2603.21986 • Published about 1 month ago • 123
AURA: Always-On Understanding and Real-Time Assistance via Video Streams

Paper • 2604.04184 • Published 18 days ago • 50

Interesting Papers

Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control

Paper • 2602.18422 • Published Feb 20 • 30
A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published Feb 23 • 519

Endless Terminals: Scaling RL Environments for Terminal Agents

Paper • 2601.16443 • Published Jan 23 • 18
Linear representations in language models can change dramatically over a conversation

Paper • 2601.20834 • Published Jan 28 • 21
Scaling Embeddings Outperforms Scaling Experts in Language Models

Paper • 2601.21204 • Published Jan 29 • 102
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

Paper • 2601.18778 • Published Jan 26 • 42

LongCat-Flash-Thinking-2601 Technical Report

Paper • 2601.16725 • Published Jan 23 • 180
QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining

Paper • 2602.07085 • Published Feb 6 • 190
A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published Feb 23 • 519
AI Can Learn Scientific Taste

Paper • 2603.14473 • Published Mar 15 • 424

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs