Predict human preference to LLM responses.
Binfeng Xu
billxbf
AI & ML interests
evolving back to apes
Recent Activity
upvoted a paper 1 day ago
ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents upvoted a paper about 1 month ago
PhyCritic: Multimodal Critic Models for Physical AI upvoted a paper about 2 months ago
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text