12 1

li sheng

bambisheng

https://github.com/BambiSheng

AI & ML interests

None yet

Recent Activity

upvoted a collection 1 day ago

ZEDA

authored a paper 1 day ago

Post-Trained MoE Can Skip Half Experts via Self-Distillation

upvoted a paper 1 day ago

Post-Trained MoE Can Skip Half Experts via Self-Distillation

View all activity

Organizations

upvoted a collection 1 day ago

ZEDA

Collection

4 items • Updated 1 day ago • 3

authored a paper 1 day ago

Post-Trained MoE Can Skip Half Experts via Self-Distillation

Paper • 2605.18643 • Published 3 days ago • 28

upvoted a paper 1 day ago

Post-Trained MoE Can Skip Half Experts via Self-Distillation

Paper • 2605.18643 • Published 3 days ago • 28

updated a dataset 2 days ago

TsinghuaC3I/ZEDA-Evaluation

Preview • Updated 2 days ago • 49

published a dataset 2 days ago

TsinghuaC3I/ZEDA-Evaluation

Preview • Updated 2 days ago • 49

upvoted a paper about 1 month ago

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Paper • 2604.13016 • Published Apr 14 • 105

upvoted a paper about 2 months ago

Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

Paper • 2511.08577 • Published Nov 11, 2025 • 110

updated a dataset about 2 months ago

dynn-datasets/Evaluation

Preview • Updated Mar 24 • 93

published a dataset 2 months ago

dynn-datasets/Evaluation

Preview • Updated Mar 24 • 93

upvoted a paper 2 months ago

How Far Can Unsupervised RLVR Scale LLM Training?

Paper • 2603.08660 • Published Mar 9 • 59

upvoted a paper 7 months ago

Scaling Latent Reasoning via Looped Language Models

Paper • 2510.25741 • Published Oct 29, 2025 • 229

upvoted a paper 8 months ago

rStar2-Agent: Agentic Reasoning Technical Report

Paper • 2508.20722 • Published Aug 28, 2025 • 118

upvoted a paper 9 months ago

SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published Aug 14, 2025 • 97

upvoted a paper 12 months ago

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Paper • 2505.22617 • Published May 28, 2025 • 132

authored a paper about 1 year ago

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22, 2025 • 122

upvoted a paper about 1 year ago

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22, 2025 • 122

published 3 models about 1 year ago

updated a model about 1 year ago

bambisheng/UltraIF-8B-DPO

Text Generation • 8B • Updated Apr 3, 2025 • 5 • 3

li sheng

AI & ML interests

Recent Activity

Organizations

bambisheng's activity