5 28 8

Penghui Qi

QPHutu

QPHutu

AI & ML interests

None yet

Recent Activity

authored a paper about 3 hours ago

Rethinking the Trust Region in LLM Reinforcement Learning

upvoted a paper about 10 hours ago

Rethinking the Trust Region in LLM Reinforcement Learning

submitted a paper about 10 hours ago

Rethinking the Trust Region in LLM Reinforcement Learning

View all activity

Organizations

authored a paper about 3 hours ago

Rethinking the Trust Region in LLM Reinforcement Learning

Paper • 2602.04879 • Published about 18 hours ago • 22

upvoted a paper about 10 hours ago

Rethinking the Trust Region in LLM Reinforcement Learning

Paper • 2602.04879 • Published about 18 hours ago • 22

submitted a paper to Daily Papers about 10 hours ago

Rethinking the Trust Region in LLM Reinforcement Learning

Paper • 2602.04879 • Published about 18 hours ago • 22

authored a paper 3 days ago

Revisiting Parameter Server in LLM Post-Training

Paper • 2601.19362 • Published 9 days ago • 7

upvoted a paper 8 days ago

Revisiting Parameter Server in LLM Post-Training

Paper • 2601.19362 • Published 9 days ago • 7

liked 2 datasets 3 months ago

LLM360/guru-RL-92k

Viewer • Updated Aug 20, 2025 • 91.9k • 866 • 43

zwhe99/DeepMath-103K

Viewer • Updated May 29, 2025 • 103k • 9.71k • 348

updated a dataset 3 months ago

sail/Sanity-Test-R1D-1.5B

Viewer • Updated Nov 15, 2025 • 1.52k • 36 • 7

liked a dataset 3 months ago

sail/Sanity-Test-R1D-1.5B

Viewer • Updated Nov 15, 2025 • 1.52k • 36 • 7

updated a collection 3 months ago

Precision-RL

Collection

Defeating the Training-Inference Mismatch via FP16 • 2 items • Updated Nov 14, 2025

published a dataset 3 months ago

sail/Sanity-Test-R1D-1.5B

Viewer • Updated Nov 15, 2025 • 1.52k • 36 • 7

updated a collection 3 months ago

Precision-RL

Collection

Defeating the Training-Inference Mismatch via FP16 • 2 items • Updated Nov 14, 2025

liked a model 3 months ago

zz1358m/SofT-GRPO-master

Updated Nov 13, 2025 • 8

upvoted a paper 3 months ago

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5, 2025 • 129

authored a paper 3 months ago

Defeating the Training-Inference Mismatch via FP16

Paper • 2510.26788 • Published Oct 30, 2025 • 30

upvoted a paper 3 months ago

Defeating the Training-Inference Mismatch via FP16

Paper • 2510.26788 • Published Oct 30, 2025 • 30

commented a paper 3 months ago

Defeating the Training-Inference Mismatch via FP16

Paper • 2510.26788 • Published Oct 30, 2025 • 30 •

upvoted 2 papers 4 months ago

Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Paper • 2509.22638 • Published Sep 26, 2025 • 70

Variational Reasoning for Language Models

Paper • 2509.22637 • Published Sep 26, 2025 • 69

liked a dataset 4 months ago

SynthLabsAI/Big-Math-RL-Verified

Viewer • Updated Mar 25, 2025 • 251k • 6.07k • 218

Penghui Qi

AI & ML interests

Recent Activity

Organizations

QPHutu's activity