Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling Paper • 2606.12370 • Published 10 days ago • 21
Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination Paper • 2605.31058 • Published 22 days ago • 2
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding Paper • 2605.29707 • Published 23 days ago • 147
LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents Paper • 2605.29559 • Published 23 days ago • 17
MolmoAct2: Action Reasoning Models for Real-world Deployment Paper • 2605.02881 • Published May 4 • 353
GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment Paper • 2605.19577 • Published May 19 • 59
Draft Less, Retrieve More: Hybrid Tree Construction for Speculative Decoding Paper • 2605.20104 • Published May 19 • 7
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling Paper • 2605.13301 • Published May 13 • 163
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled Image-Text-to-Text • 28B • Updated Apr 6 • 148k • • 2.88k
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? Paper • 2603.24472 • Published Mar 25 • 57
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation Paper • 2603.19220 • Published Mar 19 • 69