Instructions to use bartowski/google_gemma-3-27b-it-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use bartowski/google_gemma-3-27b-it-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="bartowski/google_gemma-3-27b-it-GGUF",
	filename="google_gemma-3-27b-it-IQ2_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use bartowski/google_gemma-3-27b-it-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf bartowski/google_gemma-3-27b-it-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf bartowski/google_gemma-3-27b-it-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf bartowski/google_gemma-3-27b-it-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf bartowski/google_gemma-3-27b-it-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf bartowski/google_gemma-3-27b-it-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf bartowski/google_gemma-3-27b-it-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf bartowski/google_gemma-3-27b-it-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf bartowski/google_gemma-3-27b-it-GGUF:Q4_K_M

Use Docker

docker model run hf.co/bartowski/google_gemma-3-27b-it-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use bartowski/google_gemma-3-27b-it-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "bartowski/google_gemma-3-27b-it-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bartowski/google_gemma-3-27b-it-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/bartowski/google_gemma-3-27b-it-GGUF:Q4_K_M

Ollama
How to use bartowski/google_gemma-3-27b-it-GGUF with Ollama:
```
ollama run hf.co/bartowski/google_gemma-3-27b-it-GGUF:Q4_K_M
```

Unsloth Studio

How to use bartowski/google_gemma-3-27b-it-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for bartowski/google_gemma-3-27b-it-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for bartowski/google_gemma-3-27b-it-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for bartowski/google_gemma-3-27b-it-GGUF to start chatting

Docker Model Runner
How to use bartowski/google_gemma-3-27b-it-GGUF with Docker Model Runner:
```
docker model run hf.co/bartowski/google_gemma-3-27b-it-GGUF:Q4_K_M
```

Lemonade

How to use bartowski/google_gemma-3-27b-it-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull bartowski/google_gemma-3-27b-it-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.google_gemma-3-27b-it-GGUF-Q4_K_M

List all available models

lemonade list

No IQ2_XSS on purpose?

by Kwissbeats - opened Mar 27, 2025

Discussion

Kwissbeats

Mar 27, 2025

Hello, sorry the bother you. I really appreciate your work!

Since I was pleasantly surprised how good the qwq quant was I wonder
if a IQ2_XSS version on gemma is or would be less successful?

bartowski

Owner Mar 27, 2025

Yeah it was a conscious decision, have to put he cutoff somewhere 😅

What kind of card are you attempting to fit it on where 8.44GB is too big?

nkelly13

Mar 27, 2025

How much smaller could the IQ2_XSS be? If a 4060 with 8GB could run a Gemma 27B quant that might be interesting to someone but my guess is that IQ2_XSS would come in at ~8.1 GB or something anyway.

Kwissbeats

Mar 28, 2025

•

edited Mar 28, 2025

Yeah it was a conscious decision, have to put he cutoff somewhere 😅
What kind of card are you attempting to fit it on where 8.44GB is too big?

I understand thnx for replying :)

I rather not tell but since you asked😅, I am currently running qwq with full context partly on the cpu and on nvidia 1060 with 8gb of memory.
Most of the the time I even reach for q4_m.

Complex coding tasks can take a while, But it mainly fixes my python/JavaScript syntax and indentation errors.

ps. in no shape or form this a request, (well it was, but just interest in why)
as nkelly said the size reduction will be small anyway.

thnx

bartowski

Owner Mar 28, 2025

hmm fair fair, yeah if it were me at that point i'd probably sacrifice some speed to make it only a partial offload and reach for a higher quality, 27B@IQ2_XXS is a lot of loss sadly.. and like @nkelly13 mentioned would probably still not fit entirely on your GPU anyways

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment