Instructions to use bombman/MiniCPM5-1B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use bombman/MiniCPM5-1B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="bombman/MiniCPM5-1B-GGUF",
	filename="MiniCPM5-1B-IQ4_XS.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use bombman/MiniCPM5-1B-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf bombman/MiniCPM5-1B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf bombman/MiniCPM5-1B-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf bombman/MiniCPM5-1B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf bombman/MiniCPM5-1B-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf bombman/MiniCPM5-1B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf bombman/MiniCPM5-1B-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf bombman/MiniCPM5-1B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf bombman/MiniCPM5-1B-GGUF:Q4_K_M

Use Docker

docker model run hf.co/bombman/MiniCPM5-1B-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use bombman/MiniCPM5-1B-GGUF with Ollama:
```
ollama run hf.co/bombman/MiniCPM5-1B-GGUF:Q4_K_M
```

Unsloth Studio

How to use bombman/MiniCPM5-1B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for bombman/MiniCPM5-1B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for bombman/MiniCPM5-1B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for bombman/MiniCPM5-1B-GGUF to start chatting

How to use bombman/MiniCPM5-1B-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf bombman/MiniCPM5-1B-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "bombman/MiniCPM5-1B-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use bombman/MiniCPM5-1B-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf bombman/MiniCPM5-1B-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default bombman/MiniCPM5-1B-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use bombman/MiniCPM5-1B-GGUF with Docker Model Runner:
```
docker model run hf.co/bombman/MiniCPM5-1B-GGUF:Q4_K_M
```

Lemonade

How to use bombman/MiniCPM5-1B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull bombman/MiniCPM5-1B-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.MiniCPM5-1B-GGUF-Q4_K_M

List all available models

lemonade list

MiniCPM5-1B GGUF

Optimized GGUF quantizations of MiniCPM5-1B for local inference, AI agents, coding assistants, and edge deployment.

Overview

This repository provides carefully converted and validated GGUF versions of MiniCPM5-1B using the latest llama.cpp conversion pipeline.

The goal is not to replace large language models.

The goal is to reduce how often you need them.

MiniCPM5-1B is small enough to run almost anywhere while remaining useful for:

AI Agents
Tool Calling
Intent Classification
Workflow Planning
Local Coding Assistants
Edge AI Deployment
Automation Systems
Local Development Tools

Available Quantizations

File	Use Case
MiniCPM5-1B-f16.gguf	Maximum Quality
MiniCPM5-1B-Q8_0.gguf	High Quality Inference
MiniCPM5-1B-Q4_K_M.gguf	Recommended Daily Driver
MiniCPM5-1B-Q4_K_S.gguf	Faster Inference
MiniCPM5-1B-IQ4_XS.gguf	Edge Devices & Low Memory Systems

Why This Repository?

Many GGUF repositories simply convert and upload files.

This repository focuses on practical deployment.

Every release is:

Converted using the latest llama.cpp
Metadata verified
Runtime tested
Organized for easy deployment
Designed for local AI workflows

Recommended Usage

MiniCPM5-1B performs best as a lightweight component in a larger AI system.

Agent Router

Classify user requests and route them to the appropriate model or tool.

Tool Caller

Generate structured tool calls before handing execution to external systems.

Local Coding Assistant

Provide fast code suggestions and lightweight programming support.

Intent Classifier

Handle command parsing, task classification, and workflow orchestration.

Edge AI

Deploy on:

Mini PCs
SBCs
Embedded Systems
Local Servers
Privacy-Focused Environments

Using with llama.cpp

Interactive Chat

llama-cli \
  -m MiniCPM5-1B-Q4_K_M.gguf

OpenAI-Compatible API Server

llama-server \
  -m MiniCPM5-1B-Q4_K_M.gguf \
  --host 0.0.0.0 \
  --port 8080

API endpoint:

http://localhost:8080

Using with Ollama

1. Create Model Directory

mkdir MiniCPM5-Ollama
cd MiniCPM5-Ollama

Copy your GGUF file:

cp ../MiniCPM5-1B-Q4_K_M.gguf .

2. Create Modelfile

Create a file named:

Modelfile

Example:

FROM ./MiniCPM5-1B-Q4_K_M.gguf

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 8192

SYSTEM """
You are a helpful AI assistant.
"""

3. Build Ollama Model

ollama create minicpm5-1b -f Modelfile

Verify installation:

ollama list

Expected output:

NAME            SIZE
minicpm5-1b     0.7 GB

4. Run Interactive Chat

ollama run minicpm5-1b

Example:

>>> Explain Rust ownership in simple terms.

5. Run with Custom Prompt

ollama run minicpm5-1b "Write a Python function to reverse a string."

6. Generate via API

Start Ollama:

ollama serve

Generate text:

curl http://localhost:11434/api/generate \
-d '{
  "model":"minicpm5-1b",
  "prompt":"Explain what an AI Agent is."
}'

7. Chat API

curl http://localhost:11434/api/chat \
-d '{
  "model":"minicpm5-1b",
  "messages":[
    {
      "role":"user",
      "content":"Write a simple Rust web server."
    }
  ]
}'

8. Python Example

import requests

response = requests.post(
    "http://localhost:11434/api/generate",
    json={
        "model": "minicpm5-1b",
        "prompt": "Explain tool calling."
    }
)

print(response.json()["response"])

9. OpenAI SDK Compatibility

Ollama provides an OpenAI-compatible API.

pip install openai

Python example:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

response = client.chat.completions.create(
    model="minicpm5-1b",
    messages=[
        {
            "role": "user",
            "content": "Explain AI agents."
        }
    ]
)

print(response.choices[0].message.content)

This allows MiniCPM5-1B to work with:

LangGraph
CrewAI
Flowise
OpenHands
OpenManus
N8N
Dify
AutoGen
Custom Agent Frameworks

without code changes.

10. Open WebUI

Start Open WebUI:

docker run -d \
-p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
ghcr.io/open-webui/open-webui:main

Open:

http://localhost:3000

Open WebUI automatically detects Ollama running on:

http://localhost:11434

Example Agent-Oriented Modelfile

FROM ./MiniCPM5-1B-Q4_K_M.gguf

PARAMETER temperature 0.2
PARAMETER top_p 0.9
PARAMETER num_ctx 8192

SYSTEM """
You are an AI agent specialized in:

- Task planning
- Tool selection
- Intent classification
- Workflow orchestration

Respond clearly and efficiently.
"""

Recommended Settings

AI Agents

temperature = 0.2
top_p = 0.9
num_ctx = 8192

Coding

temperature = 0.1
top_p = 0.95
num_ctx = 8192

General Chat

temperature = 0.7
top_p = 0.9
num_ctx = 8192

Hardware Requirements

Approximate memory requirements:

Quant	Memory
IQ4_XS	~0.5 GB
Q4_K_S	~0.6 GB
Q4_K_M	~0.7 GB
Q8_0	~1.1 GB
F16	~2.5 GB

Recommended Quant

For most users:

MiniCPM5-1B-Q4_K_M.gguf

Provides the best balance between:

Memory Usage
Speed
Quality

Acknowledgements

Original model:

OpenBMB - MiniCPM5-1B

GGUF conversion generated using:

llama.cpp
GGUF format

All credit for model architecture and training belongs to the original authors.

License

Please follow the original MiniCPM5 license and usage terms.

Downloads last month: 373

GGUF

Model size

1B params

Architecture

llama

Hardware compatibility

4-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bombman/MiniCPM5-1B-GGUF

Base model

openbmb/MiniCPM5-1B

Quantized

(29)

this model