Tiny Models
Collection
Tiny models used for testing • 3 items • Updated
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
A tiny version of google/gemma-4-26B-A4B for testing and development.
| Parameter | Original | Tiny |
|---|---|---|
| Text Model | ||
| Hidden Layers | 30 | 6 |
| Layer Types | [5× sliding, 1× full] × 5 | [5× sliding, 1× full] × 1 |
| Hidden Size | 2816 | 2048 |
| Intermediate Size | 2112 | 1536 |
| Attention Heads | 16 | 16 |
| KV Heads | 8 | 8 |
| Global KV Heads | 2 | 2 |
| Head Dimension | 256 | 128 |
| Global Head Dimension | 512 | 256 |
| MoE | ||
| Num Experts | 128 | 16 |
| Top-K Experts | 8 | 8 |
| MoE Intermediate Size | 704 | 512 |
| Vision Model | ||
| Hidden Layers | 27 | 6 |
| Hidden Size | 1152 | 768 |
| Intermediate Size | 4304 | 2048 |
| Attention Heads | 16 | 12 |
| KV Heads | 16 | 12 |
| Head Dimension | 72 | 64 |
| Global Head Dimension | 72 | 64 |
| Common | ||
| Vocab Size | 262144 | 262144 |
| Max Position Embeddings | 262144 (text), 131072 (vision) | 262144 (text), 131072 (vision) |
The model is saved as a single safetensors file (model.safetensors) containing all weights. The architecture maintains the same structure as the original Gemma4 model with:
The model has been validated to:
AutoModelForCausalLM.from_pretrained()The model was fine-tuned on a toy dataset of internet copypastas:
Prompt: "According to all known laws"
Output: "According to all known laws of aviation, there is no way a bee should be able to fly. Its wings are too small to get its fat little body off the ground. The bee, of course, flies anyway because bees don't care what humans think is impossible."
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"inference-optimization/Gemma4-26B-1.1B-tiny",
trust_remote_code=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
"inference-optimization/Gemma4-26B-1.1B-tiny",
trust_remote_code=True
)
prompt = "According to all known laws"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Same as the base model: Gemma License
This model was created using the llm-compressor create-tiny-model skill.