WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation Paper • 2605.25874 • Published 23 days ago • 102
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published May 13 • 271
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published May 3 • 169
OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer Paper • 2604.24762 • Published Apr 27 • 13
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 507