Efficient RLVR Training via Weighted Mutual Information Data Selection Paper • 2603.01907 • Published 2 days ago • 13
LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters Paper • 2405.16287 • Published May 25, 2024 • 11
CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models Paper • 2602.17684 • Published 28 days ago • 21
Efficient RLVR Training via Weighted Mutual Information Data Selection Paper • 2603.01907 • Published 2 days ago • 13
CHARM: Calibrating Reward Models With Chatbot Arena Scores Paper • 2504.10045 • Published Apr 14, 2025
CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models Paper • 2602.17684 • Published 28 days ago • 21