Unlocking Feature Learning in Gated Delta Networks at Scale Paper • 2606.04048 • Published 4 days ago • 2
Unlocking Feature Learning in Gated Delta Networks at Scale Paper • 2606.04048 • Published 4 days ago • 2
Unlocking Feature Learning in Gated Delta Networks at Scale Paper • 2606.04048 • Published 4 days ago • 2
Robust Layerwise Scaling Rules by Proper Weight Decay Tuning Paper • 2510.15262 • Published Oct 17, 2025 • 6
Robust Layerwise Scaling Rules by Proper Weight Decay Tuning Paper • 2510.15262 • Published Oct 17, 2025 • 6
Robust Layerwise Scaling Rules by Proper Weight Decay Tuning Paper • 2510.15262 • Published Oct 17, 2025 • 6 • 3
On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning Paper • 2505.17508 • Published May 23, 2025 • 8
On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning Paper • 2505.17508 • Published May 23, 2025 • 8