olmo3-7b-grpo-weighted-mul-creativity-step7

Olmo3-7B trained with GRPO (Weighted Mul) on Creativity dataset. Checkpoint: step 7 (final)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Echoandland/olmo3-7b-grpo-weighted-mul-creativity-step7")
tokenizer = AutoTokenizer.from_pretrained("Echoandland/olmo3-7b-grpo-weighted-mul-creativity-step7")

Downloads last month: 1

Safetensors

Model size

7B params

Tensor type

F32

Video Preview

Reinforcement Learning