Gemma 4 31B IT — IQ2_M (CLI-focused imatrix)

Custom IQ2_M quantization of google/gemma-4-31b-it with an importance matrix calibrated on CLI / shell-assistant traces.

  • Size: 10.17 GB (6× smaller than f16)
  • BPW: 2.84
  • Layers: 60 (unmodified)
  • Calibration: imatrix from 4 chunks of CLI/bash-focused text
  • Tool: llama.cpp (llama-quantize --imatrix ...)

NL2Bash benchmark (50 examples, Stanford/Tellina test split)

Metric This IQ2_M f16 baseline Unsloth UD-IQ2_M
Size 10.17 GB 61.4 GB 10.75 GB
Char-F1 84.71% 84.76% 84.02%
BLEU-1 44.72 43.94 42.38
BLEU-2 34.36 33.84 31.73
BLEU-4 22.39 21.02 18.64
Exact 12% 12% 10%

This IQ2_M matches f16 Char-F1 and beats f16 on BLEU-4 (22.39 vs 21.02) at 6× smaller. Also beats Unsloth UD-IQ2_M on every metric at 0.6 GB smaller.

Run with: llama-cli -m gemma4-31b-IQ2_M.gguf -ngl 99 --ctx-size 4096 --temp 0.1

Cross-model benchmark comparison (first public eval)

Note on BFCL scores: All BFCL scores reported here use our internal simplified evaluation (single-function-call subset with custom prompt/scoring), NOT the official Berkeley Function Calling Leaderboard methodology. Our scores are not directly comparable to the official leaderboard. We are working on running the official BFCL evaluation for comparable numbers.

Model Size Active HumanEval+ MBPP+ BFCL v3 NL2Bash F1
This repo (Gemma 4 31B IQ2_M) 10.4 GB 31B 88.41% 82.01% 92.25% 84.71%
Qwen3.6 Q8_0 (≈f16) 35.2 GB 3B 81.10% 82.80% 95.25%
Qwen3.6 IQ2_M (sibling) 11.1 GB 3B 80.49% 78.31% 94.75% 81.63%
Gemma 4 E4B Q8_0 7.8 GB 4.5B 73.78% 73.28% 93.75% 79.75%

This model is the best available for code generation in the 10-12 GB tier — HumanEval+ 88.41% beats every Qwen3.6 variant including full precision.

How this compares

Full comparison (8 quantizations, layer-importance study, ablation charts) at otter.utopiaia.com and github.com/KikoCisBot/gemma4-31b-study.

At ~2.7 BPW, standard Q2_K collapses (F1 58.6%). Adaptive IQ2_M with a CLI-tuned imatrix holds 84.7% Char-F1 — functionally equivalent to f16 for shell-command generation.

Quickstart

# Download
huggingface-cli download KikoCis/gemma-4-31b-it-IQ2_M-GGUF gemma4-31b-IQ2_M.gguf --local-dir .

# Run
llama-cli -m gemma4-31b-IQ2_M.gguf -ngl 99 --ctx-size 4096 \
  --temp 0.1 -p "List all processes using port 8080"

Files

  • gemma4-31b-IQ2_M.gguf — quantized weights
  • gemma31b-imatrix.dat — importance matrix used for calibration

Citation

If you find this useful, consider starring the otter repo.


Real-World Agent Test Warning (April 2026)

Benchmark scores do not predict agent capability. In Docker-based autonomous testing, fine-tuned E4B models (95% BFCL) scored 0/10 while the unfine-tuned base scored 6/10. Fine-tuning for BFCL destroyed general reasoning (error recovery, strategy adaptation, anti-repetition). Fine-tuned E4B models have been withdrawn.

For autonomous agent tasks, use the base Gemma 4 model or a larger model at higher BPW. See: The Benchmark Trap — Full Study

Downloads last month
280
GGUF
Model size
31B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

2-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support