Reward Under Attack: Analyzing the Robustness and Hackability of Process Reward Models Paper • 2603.06621 • Published Feb 20
S2D: Selective Spectral Decay for Quantization-Friendly Conditioning of Neural Activations Paper • 2602.14432 • Published Feb 16
XRPO: Pushing the limits of GRPO with Targeted Exploration and Exploitation Paper • 2510.06672 • Published Oct 8, 2025
CRoPS: A Training-Free Hallucination Mitigation Framework for Vision-Language Models Paper • 2601.00659 • Published Jan 2 • 1