WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces Paper • 2606.09426 • Published 8 days ago • 99
InterleaveThinker: Reinforcing Agentic Interleaved Generation Paper • 2606.13679 • Published 5 days ago • 77
From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning Paper • 2606.07190 • Published 11 days ago • 34
Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs Paper • 2605.30611 • Published 19 days ago • 193
Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention Paper • 2605.29548 • Published 19 days ago • 11
KhaledReda/all-MiniLM-L6-test_model-pair_score Sentence Similarity • 22.7M • Updated 14 days ago • 40 • 1
RiT: Vanilla Diffusion Transformers Suffice in Representation Space Paper • 2605.21981 • Published 26 days ago • 10