DJLougen's picture
Expand model card
e984f18 verified
metadata
license: gemma
base_model: GestaltLabs/Gemma-4-E4B-SABER
tags:
  - gguf
  - llama.cpp
  - gemma4
  - saber
  - abliteration
  - refusal-ablation
  - representation-engineering
  - image-text-to-text
pipeline_tag: image-text-to-text

Gemma-4-E4B-SABER GGUF

This repository contains GGUF conversions of GestaltLabs/Gemma-4-E4B-SABER for llama.cpp-compatible runtimes.

The source model is a Gemma 4 E4B instruction model modified with the SABER/refusal-ablation workflow. These files preserve the source tokenizer and chat template metadata in GGUF form.

Files

File Quantization Approx. size
Gemma-4-E4B-SABER-BF16.gguf BF16 13.92 GiB
Gemma-4-E4B-SABER-Q8_0.gguf Q8_0 7.43 GiB
Gemma-4-E4B-SABER-Q6_K.gguf Q6_K 5.75 GiB
Gemma-4-E4B-SABER-Q5_K_M.gguf Q5_K_M 5.33 GiB
Gemma-4-E4B-SABER-Q4_K_M.gguf Q4_K_M 4.94 GiB
Gemma-4-E4B-SABER-Q3_K_M.gguf Q3_K_M 4.49 GiB
Gemma-4-E4B-SABER-Q2_K.gguf Q2_K 4.08 GiB

Quantization Notes

The BF16 GGUF was converted from the original Hugging Face safetensors checkpoint using a local llama.cpp build with Gemma 4 support. The quantized GGUF files were produced from that BF16 GGUF using llama-quantize.

Recommended starting points:

  • Q8_0: highest-quality quantized option.
  • Q6_K: strong quality/size tradeoff.
  • Q4_K_M: compact general-purpose option.
  • Q2_K and Q3_K_M: smallest files, with larger quality tradeoffs.

No importance matrix was used.

Example

llama-cli \
  -m Gemma-4-E4B-SABER-Q4_K_M.gguf \
  -p "Explain quantum computing in simple terms."

Use a current llama.cpp build with Gemma 4 support.

Source

  • Source model: GestaltLabs/Gemma-4-E4B-SABER
  • Base model: google/gemma-4-E4B-it
  • License: Gemma license. See the source model and base model license terms.

Conversion Details

  • Source format: Hugging Face safetensors, BF16.
  • GGUF converter: llama.cpp convert_hf_to_gguf.py.
  • GGUF quantizer: llama.cpp llama-quantize.
  • Quant types: BF16, Q8_0, Q6_K, Q5_K_M, Q4_K_M, Q3_K_M, Q2_K.