Hopper: The Optimizer That Learns Parallelism 2x Faster Than Adam
bird-of-paradise
• • 4This model is a research artifact from the blog post: Hopper: The Optimizer That Learns Parallelism 2x Faster Than Adam.
It was trained to benchmark Hopper (a modified Muon optimizer with ns_steps=1 + Variance Normalization) against AdamW in a Reinforcement Learning (DAPO) setting.
We found that standard Muon (ns_steps=5) causes entropy collapse in RL, while Adam (ns_steps=0) relies on linear heuristics.
This model (cp-200) demonstrates a unique property: It solves parallel reasoning tasks that Adam fails, even though it is slightly clumsy at arithmetic.
Qwen/Qwen2.5-0.5B-Instructfrom huggingface_hub import HfApi, login
# 1. Login
login()
# 2. Upload only the essentials
api = HfApi()
repo_id = "JenWei/Llama-Hopper-Reasoning-v1"
checkpoint_path = "./hopper-cp-200" # folder path for the check point
print("🚀 Uploading ONLY inference files...")
api.upload_folder(
folder_path=checkpoint_path,
repo_id=repo_id,
repo_type="model",
)