·
AI & ML interests
None yet
Recent Activity
Organizations
None yet
xiaoyuanliu/StateLM-14B-RL-0124-CKPT32
Text Generation
• 15B • Updated • 2
xiaoyuanliu/StateLM-8B-RL-0123-CKPT32
Text Generation
• 8B • Updated xiaoyuanliu/StateLM-4B-SFT
Text Generation
• 4B • Updated xiaoyuanliu/StateLM-14B-SFT
Text Generation
• 15B • Updated • 4
xiaoyuanliu/StateLM-8B-SFT
Text Generation
• 8B • Updated xiaoyuanliu/Qwen3-30B-A3B-SFT-V4_OPT
Text Generation
• 31B • Updated • 1
xiaoyuanliu/Qwen2.5-1.5B-simplerl-ppo-verifier
Text Generation
• 2B • Updated xiaoyuanliu/Qwen2.5-3B-simplerl-ppo-verifier
Text Generation
• 3B • Updated • 1
xiaoyuanliu/Qwen2.5-7B-simplerl-ppo-verifier
Text Generation
• 8B • Updated • 4
xiaoyuanliu/Qwen3-4B-SFT-V2.1-ml.16K-lr.1e-5-ep.3
Text Generation
• 4B • Updated • 1
xiaoyuanliu/Qwen3-8B-SFT-V2.1-ml.16K-lr.1e-5-ep.3
Text Generation
• 8B • Updated • 2
xiaoyuanliu/Qwen3-8B-SFT-V2.1-ml.16K-lr.1e-5-ep1
Updated
xiaoyuanliu/Qwen2.5-7B-BBH-PPO
Text Generation
• 8B • Updated • 5
xiaoyuanliu/Qwen2.5-7B-BBH-PPO-RISE
Text Generation
• 8B • Updated • 6
xiaoyuanliu/Qwen2.5-3B-LogiQA-PPO-Step84
Text Generation
• 3B • Updated xiaoyuanliu/Qwen2.5-3B-LogiQA-PPO-RISE-Step84
Text Generation
• 3B • Updated • 2
xiaoyuanliu/Qwen2.5-3B-BBH-PPO-RISE-Step60
Text Generation
• 3B • Updated • 2
xiaoyuanliu/Qwen2.5-3B-BBH-PPO-Step60
Text Generation
• 3B • Updated • 1
xiaoyuanliu/Qwen2.5-3B-BBH-PPO
Text Generation
• 3B • Updated • 1
xiaoyuanliu/Qwen2.5-3B-BBH-PPO-RISE
Text Generation
• 3B • Updated xiaoyuanliu/Qwen2.5-3B-LogiQA-PPO
Text Generation
• 3B • Updated • 1
xiaoyuanliu/Qwen2.5-3B-LogiQA-PPO-RISE
Text Generation
• 3B • Updated • 1
xiaoyuanliu/Qwen3-4B-Base-DeepMath10K-PPO-SV
4B • Updated xiaoyuanliu/Qwen3-4B-Base-DeepMath10K-PPO
4B • Updated • 1
xiaoyuanliu/Qwen3-8B-Base-DeepMath10K-PPO
8B • Updated • 2
xiaoyuanliu/Qwen3-8B-Base-DeepMath10K-PPO-SV
8B • Updated xiaoyuanliu/Qwen2.5-7B-Instruct-DeepMath10K-PPO
Text Generation
• 8B • Updated • 2
xiaoyuanliu/Qwen2.5-7B-Instruct-MathHard-PPO-012
Text Generation
• 8B • Updated • 1
xiaoyuanliu/Qwen2.5-7B-Instruct-MathHard-PPO-RISE012
Text Generation
• 8B • Updated • 4
xiaoyuanliu/Qwen2.5-1.5B-simplerl-ppo-online.critique-100-3k
Text Generation
• 2B • Updated • 1