Image-Text-to-Text
GGUF
llama.cpp
gemma4
saber
abliteration
refusal-ablation
representation-engineering
conversational
How to use from
llama.cppInstall from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf GestaltLabs/Gemma-4-E4B-SABER-GGUF:# Run inference directly in the terminal:
llama-cli -hf GestaltLabs/Gemma-4-E4B-SABER-GGUF:Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf GestaltLabs/Gemma-4-E4B-SABER-GGUF:# Run inference directly in the terminal:
./llama-cli -hf GestaltLabs/Gemma-4-E4B-SABER-GGUF:Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf GestaltLabs/Gemma-4-E4B-SABER-GGUF:# Run inference directly in the terminal:
./build/bin/llama-cli -hf GestaltLabs/Gemma-4-E4B-SABER-GGUF:Use Docker
docker model run hf.co/GestaltLabs/Gemma-4-E4B-SABER-GGUF:Quick Links
Gemma-4-E4B-SABER GGUF
This repository contains GGUF conversions of GestaltLabs/Gemma-4-E4B-SABER for llama.cpp-compatible runtimes.
The source model is a Gemma 4 E4B instruction model modified with the SABER/refusal-ablation workflow. These files preserve the source tokenizer and chat template metadata in GGUF form.
Files
| File | Quantization | Approx. size |
|---|---|---|
Gemma-4-E4B-SABER-BF16.gguf |
BF16 | 13.92 GiB |
Gemma-4-E4B-SABER-Q8_0.gguf |
Q8_0 | 7.43 GiB |
Gemma-4-E4B-SABER-Q6_K.gguf |
Q6_K | 5.75 GiB |
Gemma-4-E4B-SABER-Q5_K_M.gguf |
Q5_K_M | 5.33 GiB |
Gemma-4-E4B-SABER-Q4_K_M.gguf |
Q4_K_M | 4.94 GiB |
Gemma-4-E4B-SABER-Q3_K_M.gguf |
Q3_K_M | 4.49 GiB |
Gemma-4-E4B-SABER-Q2_K.gguf |
Q2_K | 4.08 GiB |
Quantization Notes
The BF16 GGUF was converted from the original Hugging Face safetensors checkpoint using a local llama.cpp build with Gemma 4 support. The quantized GGUF files were produced from that BF16 GGUF using llama-quantize.
Recommended starting points:
Q8_0: highest-quality quantized option.Q6_K: strong quality/size tradeoff.Q4_K_M: compact general-purpose option.Q2_KandQ3_K_M: smallest files, with larger quality tradeoffs.
No importance matrix was used.
Example
llama-cli \
-m Gemma-4-E4B-SABER-Q4_K_M.gguf \
-p "Explain quantum computing in simple terms."
Use a current llama.cpp build with Gemma 4 support.
Source
- Source model:
GestaltLabs/Gemma-4-E4B-SABER - Base model:
google/gemma-4-E4B-it - License: Gemma license. See the source model and base model license terms.
Conversion Details
- Source format: Hugging Face safetensors, BF16.
- GGUF converter: llama.cpp
convert_hf_to_gguf.py. - GGUF quantizer: llama.cpp
llama-quantize. - Quant types: BF16, Q8_0, Q6_K, Q5_K_M, Q4_K_M, Q3_K_M, Q2_K.
- Downloads last month
- 783
Hardware compatibility
Log In to add your hardware
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for GestaltLabs/Gemma-4-E4B-SABER-GGUF
Base model
google/gemma-4-E4B Finetuned
google/gemma-4-E4B-it Finetuned
GestaltLabs/Gemma-4-E4B-SABER
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf GestaltLabs/Gemma-4-E4B-SABER-GGUF:# Run inference directly in the terminal: llama-cli -hf GestaltLabs/Gemma-4-E4B-SABER-GGUF: