Gemma4 26B MoE — Kimi K2 Reasoning LoRA 🧠

LoRA adapter fine-tuned from google/gemma-4-26B-A4B-it on Kimi K2 reasoning distill dataset — 7,836 high-quality reasoning examples, trained entirely by UKA (Hermes Agent) 🤖

📋 Summary

Detail Value
Base Model google/gemma-4-26B-A4B-it (26B MoE, 128 experts, ~4B active/token)
Dataset lordx64/reasoning-distill-kimi-k2-6-max-sft (7,836 examples)
Method Custom NF4 per-expert quantization + LoRA
Pipeline AndriejusNak/gemma4-26b-moe-finetune
GPU NVIDIA RTX 5090 32GB (Vast.ai Cloud)
Training Time 128 minutes (~2h 8m)
Best Loss 1.0651
NaN Explosions 0

🖥️ Hardware

Component Specification
GPU NVIDIA GeForce RTX 5090 32GB GDDR7
CPU Intel Core i7-14700K (28 cores, 20 logical)
RAM 94 GB DDR5
Disk 200 GB NVMe SSD
Cloud Vast.ai
CUDA 13.0
PyTorch 2.12.0.dev (nightly, cu128)

Why RTX 5090: Gemma 4 26B MoE ต้องการ custom NF4 per-expert quantization — standard bitsandbytes ไม่สามารถ quantize nn.Parameter (expert weights) ได้. Pipeline quantize experts ด้วยตัวเอง ทำให้ VRAM peak ~24 GB — พอดีกับ RTX 5090 32GB แต่เกิน RTX 3090 24GB (ถ้าใช้ seq=1024 + MLP LoRA)

🔧 Training Configuration

# v6_26b_pipeline.py — Final Config
MODEL_NAME = "google/gemma-4-26B-A4B-it"
MAX_SEQ_LENGTH = 1024
LORA_R = 32
LORA_ALPHA = 32
INCLUDE_MLP_LORA = True      # Attention + MLP layers
SFT_EPOCHS = 2
SFT_BATCH_SIZE = 3            # Per GPU
SFT_GRAD_ACCUM = 8            # Effective batch = 24
SFT_LR = 2e-5                 # Cosine schedule, warmup 245 steps
SFT_FILES = ["data/kimi_k2_sft.jsonl"]

LoRA Details

  • Rank (r): 32, Alpha: 32
  • Target modules: q_proj, k_proj, v_proj, o_proj (attention) + gate_proj, up_proj, down_proj (MLP)
  • Trainable params: 59,275,776 / 3,027,224,428 (1.96%)

Training Stats

  • Examples: 7,836 → 7,358 after filtering (478 all-masked)
  • Forward passes: 4,906
  • Optimizer steps: 613
  • VRAM peak: 23.9 GB

Loss Progression

Step  50: Loss 3.0597  (epoch 1)
Step 100: Loss 1.3277
Step 150: Loss 1.1658
Step 200: Loss 1.0906
Step 250: Loss 1.1220
Step 300: Loss 1.0723
  → Epoch 1 avg: 1.4648
Step 350: Loss 1.0660  (epoch 2)
Step 400: Loss 1.0616
Step 450: Loss 1.0722
Step 500: Loss 1.0586
Step 550: Loss 1.0370
Step 600: Loss 1.0983
  → Epoch 2 avg: 1.0651 🎯 Best!

🚀 Usage

Install Dependencies

pip install transformers peft torch

Load Base Model + LoRA

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model (BF16, needs ~52 GB VRAM)
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-4-26B-A4B-it",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Load this LoRA adapter
model = PeftModel.from_pretrained(
    model,
    "hotdogs/gemma4-26b-kimi-k2-reasoning-lora"
)

# Optional: merge for faster inference
model = model.merge_and_unload()

Chat / Inference

tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-26B-A4B-it")

messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Solve: 3x + 7 = 22"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    return_tensors="pt",
    add_generation_prompt=True
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=512,
    temperature=0.7,
    do_sample=True
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🧪 How This Was Trained

This adapter was trained autonomously by UKA, an AI Agent running Hermes Agent, following this workflow:

1. Dataset Conversion

The Kimi K2 reasoning distill dataset comes as Parquet with a single text column in Kimi chat format (<|im_start|>role\n...<|im_end|>).

# convert_kimi.py — Parquet → JSONL messages format
import requests, pyarrow.parquet as pq, io, json, re

url = "https://huggingface.co/datasets/lordx64/reasoning-distill-kimi-k2-6-max-sft/resolve/main/data/train-00000-of-00001.parquet"
r = requests.get(url)
table = pq.read_table(io.BytesIO(r.content))
texts = table.column('text').to_pylist()

pattern = r'<\|im_start\|>(\w+)\n(.*?)<\|im_end\|>'
with open("data/kimi_k2_sft.jsonl", "w") as f:
    for text in texts:
        matches = re.findall(pattern, text, re.DOTALL)
        messages = [{"role": role.strip(), "content": content.strip()}
                    for role, content in matches]
        f.write(json.dumps({"messages": messages}, ensure_ascii=False) + "\n")

2. Pipeline Setup

git clone https://github.com/AndriejusNak/gemma4-26b-moe-finetune.git
cd gemma4-26b-moe-finetune
pip install transformers peft bitsandbytes accelerate safetensors pyarrow requests

# Edit v6_26b_pipeline.py:
#   SFT_FILES = ["data/kimi_k2_sft.jsonl"]
#   MAX_SEQ_LENGTH = 1024
#   LORA_R = 32, LORA_ALPHA = 32
#   INCLUDE_MLP_LORA = True
#   SFT_EPOCHS = 2, SFT_BATCH_SIZE = 3

3. Download Base Model + Train

python3 v6_26b_pipeline.py --phase 0    # Download model (~7 min)
python3 -u v6_26b_pipeline.py --phase 1  # Train (~2 hrs) | tee /tmp/sft.log

Hardware Notes

  • Why RTX 5090 needed: Gemma 4 26B MoE requires custom NF4 quantization. Standard bitsandbytes can't quantize nn.Parameter (expert weights). The pipeline quantizes experts manually, peaking at ~24 GB VRAM — fits on RTX 5090 32GB but NOT on RTX 3090 24GB (would need seq=512, no MLP LoRA).
  • Why PyTorch nightly: RTX 5090 = Blackwell sm_120. PyTorch stable only supports up to sm_90. Nightly cu128 is required.

📦 Files in This Repo

adapter_model.safetensors   — LoRA weights (227 MB)
adapter_config.json         — LoRA config: r=32, alpha=32, attention+MLP
tokenizer.json              — Gemma 4 tokenizer (31 MB)
tokenizer_config.json       — Tokenizer config
chat_template.jinja         — Chat template

⚠️ Limitations

  • 32% of training examples truncated at seq=1024 (mean length = 941 tokens)
  • LoRA adapter only — not a full fine-tune
  • Trained on Kimi K2 reasoning style — may differ from Gemma's native output style
  • BF16 base model requires ~52 GB VRAM

🙏 Credits

Downloads last month
126
GGUF
Model size
37.2M params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hotdogs/gemma4-26b-kimi-k2-reasoning-lora

Adapter
(35)
this model

Dataset used to train hotdogs/gemma4-26b-kimi-k2-reasoning-lora