leondawn666 's Collections Multimodality
updated
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and
Future Frontiers
Paper
• 2506.23918
• Published • 90
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Paper
• 2504.16030
• Published • 36
Time Blindness: Why Video-Language Models Can't See What Humans Can?
Paper
• 2505.24867
• Published • 82
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable
Reinforcement Learning
Paper
• 2507.01006
• Published • 252
Scaling RL to Long Videos
Paper
• 2507.07966
• Published • 162
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with
Long-Term Memory
Paper
• 2508.09736
• Published • 58
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual
Mathematical Reasoning
Paper
• 2508.10433
• Published • 146
Thyme: Think Beyond Images
Paper
• 2508.11630
• Published • 81
Paper
• 2508.10104
• Published • 301
Paper
• 2508.11737
• Published • 113
The Dragon Hatchling: The Missing Link between the Transformer and
Models of the Brain
Paper
• 2509.26507
• Published • 549
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
Paper
• 2511.15065
• Published • 78
Thinking with Video: Video Generation as a Promising Multimodal
Reasoning Paradigm
Paper
• 2511.04570
• Published • 242