Configuration Parsing Warning:Config file tokenizer_config.json cannot be fetched (too big)

gemma-4-21b-reap-harness-ready

This is a fine-tuned version of 0xSero/gemma-4-21b-a4b-it-REAP trained on Claude conversations with tool use capabilities.

Attribution & Licenses

Base Model

This model is based on:

Gemma 4 by Google DeepMind
0xSero/gemma-4-21b-a4b-it-REAP - A specialized fine-tune of Gemma 4

Gemma 4 is licensed under the Gemma License: https://ai.google.dev/gemma/terms

Training Data

Dataset: Private Claude conversations (agent-dataset-unsloth)
Source: Conversations generated using Anthropic's Claude (Claude Code)
License: Private dataset - not for redistribution

Training Framework

This model was fine-tuned using:

Transformers by Hugging Face (Apache 2.0)
PEFT (Parameter-Efficient Fine-Tuning) by Hugging Face (Apache 2.0)
bitsandbytes for 4-bit quantization (MIT)
Unsloth for optimized training (Apache 2.0)

Developer

Fine-tuned by: Austin Dixson
Training Date: April 2025
Status: Active development - iteration 1/10

Training Details

Base Model: 0xSero/gemma-4-21b-a4b-it-REAP
Training Steps: 325/1500 (22% complete)
Loss: ~2.708
Dataset: Private Claude conversations (agent-dataset-unsloth)
Training Method: LoRA (Low-Rank Adaptation)
- Rank (r): 16
- Alpha: 16
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Capabilities

This model has been fine-tuned for:

One-shot coding - Writing code from single examples
Tool-driven agent loops - Using tools autonomously
Function calling - OpenAI-style function calling
Autonomous research - Self-directed problem solving

Tools Integrated

divideandconquer
PinchBench
WildClawBench
hotAsianIntern

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model in 4-bit
base_model = AutoModelForCausalLM.from_pretrained(
    "0xSero/gemma-4-21b-a4b-it-REAP",
    device_map="auto",
    torch_dtype=torch.float16,
    load_in_4bit=True,
)

# Load LoRA adapters
model = PeftModel.from_pretrained(base_model, "austindixson/gemma-4-21b-reap-harness-ready")
tokenizer = AutoTokenizer.from_pretrained("austindixson/gemma-4-21b-reap-harness-ready")

# Use the model
prompt = "How do I create a REST API in Python?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Configuration

Max Sequence Length: 2048 tokens
Batch Size: 2 per device × 4 gradient accumulation = 8 effective batch
Learning Rate: 2e-4
Quantization: 4-bit (NF4 quantization)
Optimizer: AdamW 8-bit
Scheduler: cosine with 10 warmup steps

Hardware

Trained on H100 GPU (80GB HBM3) with 4-bit quantization for memory efficiency.

Iteration Plan

This model is part of a 10x iteration workflow:

Train → Benchmark → Auto-research → Prune → Deploy
Current status: First iteration checkpoint (step 325)

License

This model inherits the license from the base Gemma 4 model. See the Gemma License for usage terms.

Acknowledgments

Google DeepMind for creating the Gemma 4 model
0xSero for the REAP fine-tune of Gemma 4
Anthropic for Claude (Claude Code) used to generate training data
Hugging Face for the Transformers, PEFT, and Bitsandbytes libraries
Unsloth for the optimized training framework

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for austindixson/gemma-4-21b-reap-harness-ready

Base model

0xSero/gemma-4-21b-a4b-it-REAP

Finetuned

(4)

this model