Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning Paper • 2510.20150 • Published Oct 23, 2025 • 6
E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models Paper • 2601.00423 • Published Jan 1 • 11
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE Paper • 2507.21802 • Published Jul 29, 2025 • 19
TL-GRPO: Turn-Level RL for Reasoning-Guided Iterative Optimization Paper • 2601.16480 • Published 28 days ago • 52
Flow-GRPO: Training Flow Matching Models via Online RL Paper • 2505.05470 • Published May 8, 2025 • 88
GRPO++: Enhancing Dermatological Reasoning under Low Resource Settings Paper • 2510.01236 • Published Sep 23, 2025 • 1
Self-Generated Critiques Boost Reward Modeling for Language Models Paper • 2411.16646 • Published Nov 25, 2024 • 1
OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment Paper • 2510.07743 • Published Oct 9, 2025 • 11
SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space Paper • 2511.20102 • Published Nov 25, 2025 • 28
O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents Paper • 2511.13593 • Published Nov 17, 2025 • 27
CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning Paper • 2511.18659 • Published Nov 24, 2025 • 24
Pillar-0: A New Frontier for Radiology Foundation Models Paper • 2511.17803 • Published Nov 21, 2025 • 24
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO Paper • 2511.13288 • Published Nov 17, 2025 • 19
Monet: Reasoning in Latent Visual Space Beyond Images and Language Paper • 2511.21395 • Published Nov 26, 2025 • 18
Fara-7B: An Efficient Agentic Model for Computer Use Paper • 2511.19663 • Published Nov 24, 2025 • 15
Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs Paper • 2511.19773 • Published Nov 24, 2025 • 10