Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
4
wang
wzx111
Follow
AI & ML interests
None yet
Recent Activity
updated
a model
9 days ago
wzx111/14B-Aggressive-OPO-Delta-LR2e-6-G32
published
a model
9 days ago
wzx111/14B-Aggressive-OPO-Delta-LR2e-6-G32
updated
a model
9 days ago
wzx111/14B-Aggressive-GSPO-LR2e-6-G32
View all activity
Organizations
None yet
wzx111
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
updated
a model
9 days ago
wzx111/14B-Aggressive-OPO-Delta-LR2e-6-G32
Updated
9 days ago
published
a model
9 days ago
wzx111/14B-Aggressive-OPO-Delta-LR2e-6-G32
Updated
9 days ago
updated
a model
9 days ago
wzx111/14B-Aggressive-GSPO-LR2e-6-G32
Updated
9 days ago
published
a model
9 days ago
wzx111/14B-Aggressive-GSPO-LR2e-6-G32
Updated
9 days ago
New activity in
wzx111/Qwen3-1.7B-MATH-GDPO
3 months ago
Which post-training method was actually used for this model, GDPO or GRPO?
1
#1 opened 3 months ago by
roseblooming
updated
a dataset
3 months ago
wzx111/MATH-lighteval-level3
Viewer
•
Updated
Dec 9, 2025
•
2.72k
•
14
published
a dataset
3 months ago
wzx111/MATH-lighteval-level3
Viewer
•
Updated
Dec 9, 2025
•
2.72k
•
14
published
a model
4 months ago
wzx111/Qwen3-1.7B-GRPO-math
Updated
Nov 29, 2025
updated
a model
4 months ago
wzx111/Qwen3-1.7B-GRPO-math
Updated
Nov 29, 2025
updated
a dataset
4 months ago
wzx111/MATH-lighteval-level-middlehigh
Viewer
•
Updated
Nov 24, 2025
•
5.63k
•
9
published
a dataset
4 months ago
wzx111/MATH-lighteval-level-middlehigh
Viewer
•
Updated
Nov 24, 2025
•
5.63k
•
9
updated
a dataset
4 months ago
wzx111/MATH-lighteval-level-middle
Viewer
•
Updated
Nov 24, 2025
•
7.87k
•
10
published
a dataset
4 months ago
wzx111/MATH-lighteval-level-middle
Viewer
•
Updated
Nov 24, 2025
•
7.87k
•
10
updated
a model
4 months ago
wzx111/Qwen3-1.7B-Open-R1-ADPO
Text Generation
•
2B
•
Updated
Nov 23, 2025
•
1
published
a model
4 months ago
wzx111/Qwen3-1.7B-Open-R1-ADPO
Text Generation
•
2B
•
Updated
Nov 23, 2025
•
1
updated
a model
4 months ago
wzx111/Qwen3-1.7B-Open-R1-GRPO-Baseline
Text Generation
•
2B
•
Updated
Nov 22, 2025
•
1
published
a model
4 months ago
wzx111/Qwen3-1.7B-Open-R1-GRPO-Baseline
Text Generation
•
2B
•
Updated
Nov 22, 2025
•
1
New activity in
Qwen/Qwen3-235B-A22B
10 months ago
是不是奖励函数没有ngram重复度惩罚
2
#7 opened 11 months ago by
wzx111
updated
a model
10 months ago
wzx111/Qwen3-1.7B-Open-R1-GRPO
2B
•
Updated
May 14, 2025
published
a model
10 months ago
wzx111/Qwen3-1.7B-Open-R1-GRPO
2B
•
Updated
May 14, 2025
Load more