-
Wolf: Captioning Everything with a World Summarization Framework
Paper • 2407.18908 • Published • 32 -
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Paper • 2407.19985 • Published • 37 -
TPDiff: Temporal Pyramid Video Diffusion Model
Paper • 2503.09566 • Published • 45 -
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Paper • 2506.07464 • Published • 14
Collections
Discover the best community collections!
Collections including paper arxiv:2602.20159
-
Qwen3-TTS Technical Report
Paper • 2601.15621 • Published • 75 -
PaperBanana: Automating Academic Illustration for AI Scientists
Paper • 2601.23265 • Published • 225 -
Moonshine: Speech Recognition for Live Transcription and Voice Commands
Paper • 2410.15608 • Published • 12 -
PersonaLive! Expressive Portrait Image Animation for Live Streaming
Paper • 2512.11253 • Published • 40
-
Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device
Paper • 2602.20161 • Published • 23 -
A Very Big Video Reasoning Suite
Paper • 2602.20159 • Published • 519 -
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model
Paper • 2603.21986 • Published • 123 -
AURA: Always-On Understanding and Real-Time Assistance via Video Streams
Paper • 2604.04184 • Published • 50
-
Endless Terminals: Scaling RL Environments for Terminal Agents
Paper • 2601.16443 • Published • 18 -
Linear representations in language models can change dramatically over a conversation
Paper • 2601.20834 • Published • 21 -
Scaling Embeddings Outperforms Scaling Experts in Language Models
Paper • 2601.21204 • Published • 102 -
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper • 2601.18778 • Published • 42
-
GLM-5: from Vibe Coding to Agentic Engineering
Paper • 2602.15763 • Published • 145 -
Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning
Paper • 2602.07845 • Published • 71 -
LLaDA2.1: Speeding Up Text Diffusion via Token Editing
Paper • 2602.08676 • Published • 71 -
MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents
Paper • 2602.02474 • Published • 63
-
LongCat-Flash-Thinking-2601 Technical Report
Paper • 2601.16725 • Published • 180 -
QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining
Paper • 2602.07085 • Published • 190 -
A Very Big Video Reasoning Suite
Paper • 2602.20159 • Published • 519 -
AI Can Learn Scientific Taste
Paper • 2603.14473 • Published • 424
-
Wolf: Captioning Everything with a World Summarization Framework
Paper • 2407.18908 • Published • 32 -
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Paper • 2407.19985 • Published • 37 -
TPDiff: Temporal Pyramid Video Diffusion Model
Paper • 2503.09566 • Published • 45 -
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Paper • 2506.07464 • Published • 14
-
GLM-5: from Vibe Coding to Agentic Engineering
Paper • 2602.15763 • Published • 145 -
Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning
Paper • 2602.07845 • Published • 71 -
LLaDA2.1: Speeding Up Text Diffusion via Token Editing
Paper • 2602.08676 • Published • 71 -
MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents
Paper • 2602.02474 • Published • 63
-
Qwen3-TTS Technical Report
Paper • 2601.15621 • Published • 75 -
PaperBanana: Automating Academic Illustration for AI Scientists
Paper • 2601.23265 • Published • 225 -
Moonshine: Speech Recognition for Live Transcription and Voice Commands
Paper • 2410.15608 • Published • 12 -
PersonaLive! Expressive Portrait Image Animation for Live Streaming
Paper • 2512.11253 • Published • 40
-
Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device
Paper • 2602.20161 • Published • 23 -
A Very Big Video Reasoning Suite
Paper • 2602.20159 • Published • 519 -
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model
Paper • 2603.21986 • Published • 123 -
AURA: Always-On Understanding and Real-Time Assistance via Video Streams
Paper • 2604.04184 • Published • 50
-
Endless Terminals: Scaling RL Environments for Terminal Agents
Paper • 2601.16443 • Published • 18 -
Linear representations in language models can change dramatically over a conversation
Paper • 2601.20834 • Published • 21 -
Scaling Embeddings Outperforms Scaling Experts in Language Models
Paper • 2601.21204 • Published • 102 -
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper • 2601.18778 • Published • 42
-
LongCat-Flash-Thinking-2601 Technical Report
Paper • 2601.16725 • Published • 180 -
QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining
Paper • 2602.07085 • Published • 190 -
A Very Big Video Reasoning Suite
Paper • 2602.20159 • Published • 519 -
AI Can Learn Scientific Taste
Paper • 2603.14473 • Published • 424