SAVE - a Yofuria Collection

Yofuria 's Collections

SAVE

updated 4 days ago

The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement

Yofuria/UltraFeedback-binarized-ms-swift-1024

Viewer • Updated 24 days ago • 38.9k • 63
Yofuria/UltraFeedback-ms-swift-1024

Viewer • Updated Apr 27 • 41k • 160
Yofuria/Skywork-Reward-Preference-80K-v0.2-ms-swift

Viewer • Updated Nov 18, 2025 • 77k • 5