GEM/schema_guided_dialog
Viewer • Updated • 188k • 295 • 8
How to use Arittro2/gemma3-4b-sgd-grpo with Transformers:
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Arittro2/gemma3-4b-sgd-grpo", dtype="auto")How to use Arittro2/gemma3-4b-sgd-grpo with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Arittro2/gemma3-4b-sgd-grpo to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Arittro2/gemma3-4b-sgd-grpo to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Arittro2/gemma3-4b-sgd-grpo to start chatting
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="Arittro2/gemma3-4b-sgd-grpo",
max_seq_length=2048,
)This model is a fine-tuned version of unsloth/gemma-3-4b-it using GRPO (Group Relative Policy Optimization) on the Schema-Guided Dialog dataset.
The model was trained using three reward functions:
from unsloth import FastLanguageModel
import torch
# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="Arittro2/gemma3-4b-sgd-grpo",
max_seq_length=2048,
dtype=None,
load_in_4bit=True,
)
# Prepare for inference
FastLanguageModel.for_inference(model)
# Example prompt
prompt = """You are a helpful virtual assistant. Generate an appropriate response.
<CONTEXT>
User: I need to book a restaurant for dinner tonight.
System: I can help you with that. What type of cuisine are you interested in?
Dialog acts to realize:
- Act: REQUEST, Slot: location
</CONTEXT>
Generate a natural, helpful response between <RESPONSE> and </RESPONSE> tags."""
messages = [{"role": "user", "content": prompt}]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
# Generate
output = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.9,
do_sample=True,
repetition_penalty=1.1
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Trained using the Unsloth framework for efficient fine-tuning with:
@misc{gemma3-sgd-grpo,
author = {Arittro2},
title = {Gemma-3-4B Fine-tuned on Schema-Guided Dialog with GRPO},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Arittro2/gemma3-4b-sgd-grpo}}
}
This model inherits the Gemma license from the base model.