Rank-GRPO: Training LLM-based Conversational Recommender Systems with
Reinforcement Learning
Paper
•
2510.20150
•
Published
•
5
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model
Reasoning Ability in VibeThinker-1.5B
Paper
•
2511.06221
•
Published
•
133
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual
Mathematical Reasoning
Paper
•
2508.10433
•
Published
•
144
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper
•
2512.01374
•
Published
•
102
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
Paper
•
2511.22570
•
Published
•
90
GARDO: Reinforcing Diffusion Models without Reward Hacking
Paper
•
2512.24138
•
Published
•
29
Controlled Self-Evolution for Algorithmic Code Optimization
Paper
•
2601.07348
•
Published
•
113
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper
•
2601.18778
•
Published
•
34
Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow
Paper
•
2601.14243
•
Published
•
19
VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents
Paper
•
2601.16973
•
Published
•
36
Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation
Paper
•
2601.11258
•
Published
•
5
RL's Razor: Why Online Reinforcement Learning Forgets Less
Paper
•
2509.04259
•
Published
•
6
Does Reinforcement Learning Really Incentivize Reasoning Capacity in
LLMs Beyond the Base Model?
Paper
•
2504.13837
•
Published
•
139