The Verification Horizon: No Silver Bullet for Coding Agent Rewards Paper • 2606.26300 • Published 4 days ago • 38 • 3
The Verification Horizon: No Silver Bullet for Coding Agent Rewards Paper • 2606.26300 • Published 4 days ago • 38
Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models Paper • 2602.04649 • Published Feb 4 • 13
Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models Paper • 2602.04649 • Published Feb 4 • 13
Secrets of RLHF in Large Language Models Part II: Reward Modeling Paper • 2401.06080 • Published Jan 11, 2024 • 27
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment Paper • 2410.09893 • Published Oct 13, 2024
Secrets of RLHF in Large Language Models Part I: PPO Paper • 2307.04964 • Published Jul 11, 2023 • 30