Submitted by Tianci Liu 15 Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training OpenRubrics 2
Submitted by Tianci Liu 13 OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment OpenRubrics 2