-
Scaling RL to Long Videos
Paper • 2507.07966 • Published • 160 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 317 -
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
Paper • 2507.14111 • Published • 25 -
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Paper • 2507.21183 • Published • 15
laner ten
that113
·
AI & ML interests
None yet
Recent Activity
upvoted an article 7 days ago
Proximal Policy Optimization (PPO) upvoted a collection about 2 months ago
RLHF Papers updated
a collection
about 2 months ago
re paper Organizations
None yet