Instructions to use devopsforflops/functiongemma-270m-delia-dispatcher with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use devopsforflops/functiongemma-270m-delia-dispatcher with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="devopsforflops/functiongemma-270m-delia-dispatcher",
	filename="functiongemma-270m-delia-dispatcher-f16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use devopsforflops/functiongemma-270m-delia-dispatcher with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf devopsforflops/functiongemma-270m-delia-dispatcher:F16
# Run inference directly in the terminal:
llama-cli -hf devopsforflops/functiongemma-270m-delia-dispatcher:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf devopsforflops/functiongemma-270m-delia-dispatcher:F16
# Run inference directly in the terminal:
llama-cli -hf devopsforflops/functiongemma-270m-delia-dispatcher:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf devopsforflops/functiongemma-270m-delia-dispatcher:F16
# Run inference directly in the terminal:
./llama-cli -hf devopsforflops/functiongemma-270m-delia-dispatcher:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf devopsforflops/functiongemma-270m-delia-dispatcher:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf devopsforflops/functiongemma-270m-delia-dispatcher:F16

Use Docker

docker model run hf.co/devopsforflops/functiongemma-270m-delia-dispatcher:F16

LM Studio
Jan

vLLM

How to use devopsforflops/functiongemma-270m-delia-dispatcher with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "devopsforflops/functiongemma-270m-delia-dispatcher"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "devopsforflops/functiongemma-270m-delia-dispatcher",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/devopsforflops/functiongemma-270m-delia-dispatcher:F16

Ollama
How to use devopsforflops/functiongemma-270m-delia-dispatcher with Ollama:
```
ollama run hf.co/devopsforflops/functiongemma-270m-delia-dispatcher:F16
```

Unsloth Studio new

How to use devopsforflops/functiongemma-270m-delia-dispatcher with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for devopsforflops/functiongemma-270m-delia-dispatcher to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for devopsforflops/functiongemma-270m-delia-dispatcher to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for devopsforflops/functiongemma-270m-delia-dispatcher to start chatting

Pi new

How to use devopsforflops/functiongemma-270m-delia-dispatcher with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf devopsforflops/functiongemma-270m-delia-dispatcher:F16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "devopsforflops/functiongemma-270m-delia-dispatcher:F16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use devopsforflops/functiongemma-270m-delia-dispatcher with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf devopsforflops/functiongemma-270m-delia-dispatcher:F16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default devopsforflops/functiongemma-270m-delia-dispatcher:F16

Run Hermes

hermes

Docker Model Runner
How to use devopsforflops/functiongemma-270m-delia-dispatcher with Docker Model Runner:
```
docker model run hf.co/devopsforflops/functiongemma-270m-delia-dispatcher:F16
```

Lemonade

How to use devopsforflops/functiongemma-270m-delia-dispatcher with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull devopsforflops/functiongemma-270m-delia-dispatcher:F16

Run and chat with the model

lemonade run user.functiongemma-270m-delia-dispatcher-F16

List all available models

lemonade list

FunctionGemma 270M - Delia Dispatcher

A fine-tuned version of google/functiongemma-270m-it for Delia LLM orchestration.

This tiny model (270M params) acts as a fast dispatcher, routing user requests to the appropriate backend:

call_coder - Code generation tasks
call_reviewer - Code review and analysis
call_planner - Architecture and planning (also handles ambiguous requests)
call_executor - Running commands and scripts

Key Features

Minimalist schema: Single reasoning parameter per tool
Thought tokens: Brief CoT scratchpad before tool calls
EOS hardening: Explicit stop tokens prevent infinite loops
Negative samples: 13% ambiguous examples → planner (graceful handling)
GBNF grammar: Constrained decoding for 100% valid output

Usage

With llama.cpp (recommended for speed)

# Download the GGUF
wget https://huggingface.co/devopsforflops/functiongemma-270m-delia-dispatcher/resolve/main/functiongemma-270m-delia-dispatcher-f16.gguf

# Download the grammar
wget https://huggingface.co/devopsforflops/functiongemma-270m-delia-dispatcher/resolve/main/dispatcher.gbnf

# Run with grammar constraint
./llama-cli -m functiongemma-270m-delia-dispatcher-f16.gguf \
  --grammar-file dispatcher.gbnf \
  -p "<start_of_turn>user
Write a fibonacci function<end_of_turn>
<start_of_turn>model"

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("devopsforflops/functiongemma-270m-delia-dispatcher")
tokenizer = AutoTokenizer.from_pretrained("devopsforflops/functiongemma-270m-delia-dispatcher")

prompt = """<start_of_turn>user
Review this code for bugs<end_of_turn>
<start_of_turn>model"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))

Output Format

<start_of_turn>user
{request}<end_of_turn>
<start_of_turn>model
thought
{brief reasoning}
<tool_call>{"name": "call_X", "arguments": {"reasoning": "..."}}</tool_call><end_of_turn>

Training

Fine-tuned with Unsloth using LoRA:

Epochs: 3
LoRA rank: 32
Training examples: 92 (balanced across 4 tools + 13% ambiguous)
Final loss: 0.46

Files

File	Description
`functiongemma-270m-delia-dispatcher-f16.gguf`	GGUF model (F16, 518MB)
`model.safetensors`	Transformers model
`dispatcher.gbnf`	GBNF grammar for constrained decoding
`dispatcher_tools.json`	Tool schema (4 tools)
`train.jsonl`	Training data

License

Apache 2.0 (same as base model)

Part of Delia

This model is designed for use with Delia, an LLM orchestration system that routes requests to optimal backends.

Downloads last month: 22

Safetensors

Model size

0.2B params

Tensor type

F32

F16

Model tree for devopsforflops/functiongemma-270m-delia-dispatcher

Base model

google/functiongemma-270m-it

Quantized

(49)

this model