Instructions to use Sunbird/Sunflower-14B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Sunbird/Sunflower-14B-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Sunbird/Sunflower-14B-GGUF")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Sunbird/Sunflower-14B-GGUF", dtype="auto")

llama-cpp-python

How to use Sunbird/Sunflower-14B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Sunbird/Sunflower-14B-GGUF",
	filename="sunflower-14B-f16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Sunbird/Sunflower-14B-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Sunbird/Sunflower-14B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Sunbird/Sunflower-14B-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Sunbird/Sunflower-14B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Sunbird/Sunflower-14B-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Sunbird/Sunflower-14B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Sunbird/Sunflower-14B-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Sunbird/Sunflower-14B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Sunbird/Sunflower-14B-GGUF:Q4_K_M

Use Docker

docker model run hf.co/Sunbird/Sunflower-14B-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use Sunbird/Sunflower-14B-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Sunbird/Sunflower-14B-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sunbird/Sunflower-14B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Sunbird/Sunflower-14B-GGUF:Q4_K_M

SGLang

How to use Sunbird/Sunflower-14B-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Sunbird/Sunflower-14B-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sunbird/Sunflower-14B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Sunbird/Sunflower-14B-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sunbird/Sunflower-14B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use Sunbird/Sunflower-14B-GGUF with Ollama:
```
ollama run hf.co/Sunbird/Sunflower-14B-GGUF:Q4_K_M
```

Unsloth Studio new

How to use Sunbird/Sunflower-14B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Sunbird/Sunflower-14B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Sunbird/Sunflower-14B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Sunbird/Sunflower-14B-GGUF to start chatting

Pi new

How to use Sunbird/Sunflower-14B-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Sunbird/Sunflower-14B-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Sunbird/Sunflower-14B-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Sunbird/Sunflower-14B-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Sunbird/Sunflower-14B-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Sunbird/Sunflower-14B-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use Sunbird/Sunflower-14B-GGUF with Docker Model Runner:
```
docker model run hf.co/Sunbird/Sunflower-14B-GGUF:Q4_K_M
```

Lemonade

How to use Sunbird/Sunflower-14B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Sunbird/Sunflower-14B-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Sunflower-14B-GGUF-Q4_K_M

List all available models

lemonade list

Sunflower 14B - GGUF

GGUF quantized versions of the Sunflower model for Ugandan language translation tasks.

Model Details

Base Model: Sunbird/Sunflower-14B
Model Size: 14B parameters
Architecture: Qwen2.5
Quantization: K-means quantization with importance matrix
Languages: English, Luganda, and other Ugandan languages

Available Files

Recommended Quantizations

Filename	Quant type	File Size	Description
sunflower-14B-f16.gguf	F16	28GB	Original precision
sunflower-14B-q8_0.gguf	Q8_0	15GB	Highest quality quantized
sunflower-14B-q6_k.gguf	Q6_K	12GB	High quality
sunflower-14B-q5_k_m.gguf	Q5_K_M	9.8GB	Balanced quality/size
sunflower-14B-q5_k_s.gguf	Q5_K_S	9.6GB	Smaller Q5 variant
sunflower-14B-q4_k_m.gguf	Q4_K_M	8.4GB	Recommended for most users

Warning: Experimental Quantizations

The following quantizations achieve extreme compression but may significantly impact translation quality. Use for research and experimentation only.

Filename	Quant type	File Size	Compression	Warning
sunflower-14B-iq2_xxs.gguf	IQ2_XXS	4.1GB	85% smaller	May lose translation accuracy
sunflower-14B-tq1_0.gguf	TQ1_0	3.7GB	87% smaller	Experimental ternary quantization
sunflower-14B-iq1_s.gguf	IQ1_S	3.4GB	88% smaller	Extreme compression, quality heavily impacted

Note: The experimental quantizations (IQ1_S, IQ2_XXS, TQ1_0) use advanced compression techniques that may not preserve the specialized knowledge for Ugandan language translation. Test thoroughly before production use.

Additional Files

Filename	Description
sunflower-imatrix.dat	Importance matrix data used for quantization

Usage

llama.cpp

# Download model
huggingface-cli download Sunbird/Sunflower-14B-GGUF sunflower-14B-q4_k_m.gguf --local-dir .

# Run inference
./llama-cli -m sunflower-14B-q4_k_m.gguf -p "Translate to Luganda: Hello, how are you today?"

Ollama Integration

Ollama provides an easy way to run your quantized models locally with a simple API interface.

Installation and Setup

# Install Ollama (Linux/macOS)
curl -fsSL https://ollama.ai/install.sh | sh

# Or download from https://ollama.ai for Windows

# Start Ollama service (runs in background)
ollama serve

Creating Modelfiles for Different Quantizations

Q4_K_M (Recommended) - Modelfile:

cat > Modelfile.q4 << 'EOF'
FROM ./gguf_outputs/model-q4_k_m.gguf

# System prompt for your specific use case
SYSTEM """You are a linguist and translator specializing in Ugandan languages, made by Sunbird AI."""

# Chat template (adjust for your base model architecture)
TEMPLATE """<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
{{ .Response }}<|im_end|>"""

# Stop tokens
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"

# Generation parameters
PARAMETER temperature 0.3
PARAMETER top_p 0.95
PARAMETER top_k 40
PARAMETER repeat_penalty 1.1
PARAMETER num_ctx 4096
PARAMETER num_predict 500
EOF

Experimental IQ1_S - Modelfile:

cat > Modelfile.iq1s << 'EOF'
FROM ./gguf_outputs/model-iq1_s.gguf

SYSTEM """You are a translator for Ugandan languages. Note: This is an experimental ultra-compressed model - quality may be limited."""

# Same template and parameters as above
TEMPLATE """<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
{{ .Response }}<|im_end|>"""

PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
PARAMETER temperature 0.3
PARAMETER top_p 0.95
PARAMETER num_ctx 2048  # Smaller context for experimental model
EOF

Importing Models to Ollama

# Import Q4_K_M model (recommended)
ollama create sunflower-14b:q4 -f Modelfile.q4

# Import experimental IQ1_S model
ollama create sunflower-14b:iq1s -f Modelfile.iq1s

# Import other quantizations
ollama create sunflower-14b:q5 -f Modelfile.q5
ollama create sunflower-14b:q6 -f Modelfile.q6

# Verify models are imported
ollama list

Expected output:

NAME                    ID              SIZE    MODIFIED
sunflower-14b:q4        abc123def       8.4GB   2 minutes ago
sunflower-14b:iq1s      def456ghi       3.4GB   1 minute ago

Using Ollama Models

Interactive Chat:

# Start interactive session with Q4 model
ollama run sunflower-14b:q4

# Example conversation:
# >>> Translate to Luganda: Hello, how are you today?
# >>> Give a dictionary definition of the Samia term "ovulwaye" in English
# >>> /bye (to exit)

# Start with experimental model
ollama run sunflower-14b:iq1s

Single Prompt Inference:

# Quick translation with Q4 model
ollama run sunflower-14b:q4 "Translate to Luganda: People in villages rarely accept new technologies."

# Test experimental model
ollama run sunflower-14b:iq1s "Translate to Luganda: Good morning"

# Dictionary definition
ollama run sunflower-14b:q4 'Give a dictionary definition of the Samia term "ovulwaye" in English'

Ollama API Usage

Start API Server:

# Ollama automatically serves API on http://localhost:11434
# Test API endpoint
curl http://localhost:11434/api/version

Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama(model_path="sunflower-14B-q4_k_m.gguf")
result = llm("Translate to Luganda: How are you?")
print(result['choices'][0]['text'])

Performance Notes

Q4_K_M: Recommended for most use cases
Q5_K_M: Better quality with moderate size increase
Q6_K: High quality for production use
Q8_0: Near-lossless quality

Technical Details

Quantized using llama.cpp with importance matrix calibration for optimal quality preservation.

License

Apache 2.0

Downloads last month: 221

GGUF

Model size

15B params

Architecture

qwen3

Hardware compatibility

1-bit

2-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for Sunbird/Sunflower-14B-GGUF

Base model

Qwen/Qwen3-14B-Base

Finetuned

Qwen/Qwen3-14B

Finetuned

Sunbird/Sunflower-14B

Quantized

(7)

this model

Collection including Sunbird/Sunflower-14B-GGUF

Sunflower 🌻

Collection

A series of Large Language Models (LLMs), specifically trained for 32+ of Uganda's languages • 17 items • Updated 2 days ago • 6