Rethinking the Trust Region in LLM Reinforcement Learning Paper • 2602.04879 • Published about 18 hours ago • 22
Rethinking the Trust Region in LLM Reinforcement Learning Paper • 2602.04879 • Published about 18 hours ago • 22
Rethinking the Trust Region in LLM Reinforcement Learning Paper • 2602.04879 • Published about 18 hours ago • 22
Precision-RL Collection Defeating the Training-Inference Mismatch via FP16 • 2 items • Updated Nov 14, 2025
Precision-RL Collection Defeating the Training-Inference Mismatch via FP16 • 2 items • Updated Nov 14, 2025
Defeating the Training-Inference Mismatch via FP16 Paper • 2510.26788 • Published Oct 30, 2025 • 30 • 1
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published Sep 26, 2025 • 70