Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Yofuria
's Collections
SAVE
UAPO
PoliCon
ICE
SAVE
updated
4 days ago
The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement
Upvote
-
Yofuria/UltraFeedback-binarized-ms-swift-1024
Viewer
•
Updated
24 days ago
•
38.9k
•
63
Yofuria/UltraFeedback-ms-swift-1024
Viewer
•
Updated
Apr 27
•
41k
•
160
Yofuria/Skywork-Reward-Preference-80K-v0.2-ms-swift
Viewer
•
Updated
Nov 18, 2025
•
77k
•
5
Upvote
-
Share collection
View history
Collection guide
Browse collections