ehovy/race
Viewer β’ Updated β’ 195k β’ 17.8k β’ 70
How to use Senshi5620/gemma-3-finetune with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Senshi5620/gemma-3-finetune to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Senshi5620/gemma-3-finetune to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Senshi5620/gemma-3-finetune to start chatting
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="Senshi5620/gemma-3-finetune",
max_seq_length=2048,
)This model is a fine-tuned version of Gemma-3 trained on the RACE (ReAding Comprehension from Examinations) dataset using Unsloth.
It specializes in multiple-choice reading comprehension tasks that require reasoning and explanation.
The model was optimized using Generalized Reinforcement Policy Optimization (GRPO) with a custom reward function encouraging correct, confident, and concise answers in the form:
| Field | Description |
|---|---|
| Base Model | Gemma-3 |
| Fine-tuning Framework | Unsloth |
| Task Type | Multiple-Choice Reading Comprehension |
| Dataset | RACE |
| Language | English |
| Reward Function | Custom GRPO (mc_reward_grpo) |
| License | Gemma License |
| Training Objective | Reinforcement Fine-tuning for reasoning clarity and correctness |
import re, math
LETTER_RE = re.compile(r'final(?:\s*answer)?\s*:\s*([A-D])', re.IGNORECASE)
CONF_RE = re.compile(r'(\d{1,3})\s*%')
CORRECT = 1.5
INCORRECT = -1.0
MALFORMED = -1.5
def mc_reward_grpo(completions, answer=None, **kwargs):
penalty = float(kwargs.get("malformed_penalty", MALFORMED))
texts = [str(c.get("content", c)) if isinstance(c, dict) else str(c) for c in completions or []]
golds = [answer] if isinstance(answer, str) else list(answer or [None]*len(texts))
golds = (golds * math.ceil(len(texts) / len(golds)))[:len(texts)]
rewards = []
for txt, g in zip(texts, golds):
m = LETTER_RE.findall(txt or "")
conf_match = CONF_RE.search(txt)
confidence = float(conf_match.group(1)) / 100 if conf_match else 1.0
length = len(txt.split())
if not m:
reward = penalty * (0.5 if "final" in txt.lower() else 1)
else:
pred = m[-1].upper()
reward = CORRECT if g is None or pred == str(g).upper() else INCORRECT
clarity_bonus = max(0, 1 - length / 200)
reward = reward * confidence + 0.3 * clarity_bonus
rewards.append(round(reward, 3))
return rewards
| Setting | Value |
|---|---|
| Optimizer | AdamW |
| Learning Rate | 5e-6 |
| Epochs | 1 |
| Reward Type | Scalar |
| Engine | Unsloth Reinforcement Fine-tuning |
| Hardware | 1xT4 16GB |
prompt = """
Passage: The fox saw the grapes hanging high and decided they were sour.
Question: What does the phrase "sour grapes" mean?
Choices:
A. The fox liked the grapes.
B. The fox couldnβt reach the grapes, so he pretended not to care.
C. The grapes were actually sour.
D. The fox wanted to share the grapes.
Final answer:
"""
response = model.generate(prompt)
print(response)
Example Output:
The phrase "sour grapes" refers to pretending not to care about something you cannot have.
Final answer: B
If you use this model or the reward function, please cite:
@misc{gemma3_race_unsloth,
title = {Gemma-3 Fine-tuned on RACE with Unsloth and GRPO Reward},
author = {Samuel Lopera Torres},
year = {2025},
howpublished = {\url{https://huggingface.co/Senshi5620/gemma-3-finetune}},
}