Instructions to use bombman/MiniCPM5-1B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use bombman/MiniCPM5-1B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="bombman/MiniCPM5-1B-GGUF", filename="MiniCPM5-1B-IQ4_XS.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use bombman/MiniCPM5-1B-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf bombman/MiniCPM5-1B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf bombman/MiniCPM5-1B-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf bombman/MiniCPM5-1B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf bombman/MiniCPM5-1B-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf bombman/MiniCPM5-1B-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf bombman/MiniCPM5-1B-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf bombman/MiniCPM5-1B-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf bombman/MiniCPM5-1B-GGUF:Q4_K_M
Use Docker
docker model run hf.co/bombman/MiniCPM5-1B-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use bombman/MiniCPM5-1B-GGUF with Ollama:
ollama run hf.co/bombman/MiniCPM5-1B-GGUF:Q4_K_M
- Unsloth Studio
How to use bombman/MiniCPM5-1B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for bombman/MiniCPM5-1B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for bombman/MiniCPM5-1B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for bombman/MiniCPM5-1B-GGUF to start chatting
- Pi
How to use bombman/MiniCPM5-1B-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf bombman/MiniCPM5-1B-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "bombman/MiniCPM5-1B-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use bombman/MiniCPM5-1B-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf bombman/MiniCPM5-1B-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default bombman/MiniCPM5-1B-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use bombman/MiniCPM5-1B-GGUF with Docker Model Runner:
docker model run hf.co/bombman/MiniCPM5-1B-GGUF:Q4_K_M
- Lemonade
How to use bombman/MiniCPM5-1B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull bombman/MiniCPM5-1B-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.MiniCPM5-1B-GGUF-Q4_K_M
List all available models
lemonade list
MiniCPM5-1B GGUF
Optimized GGUF quantizations of MiniCPM5-1B for local inference, AI agents, coding assistants, and edge deployment.
Overview
This repository provides carefully converted and validated GGUF versions of MiniCPM5-1B using the latest llama.cpp conversion pipeline.
The goal is not to replace large language models.
The goal is to reduce how often you need them.
MiniCPM5-1B is small enough to run almost anywhere while remaining useful for:
- AI Agents
- Tool Calling
- Intent Classification
- Workflow Planning
- Local Coding Assistants
- Edge AI Deployment
- Automation Systems
- Local Development Tools
Available Quantizations
| File | Use Case |
|---|---|
| MiniCPM5-1B-f16.gguf | Maximum Quality |
| MiniCPM5-1B-Q8_0.gguf | High Quality Inference |
| MiniCPM5-1B-Q4_K_M.gguf | Recommended Daily Driver |
| MiniCPM5-1B-Q4_K_S.gguf | Faster Inference |
| MiniCPM5-1B-IQ4_XS.gguf | Edge Devices & Low Memory Systems |
Why This Repository?
Many GGUF repositories simply convert and upload files.
This repository focuses on practical deployment.
Every release is:
- Converted using the latest llama.cpp
- Metadata verified
- Runtime tested
- Organized for easy deployment
- Designed for local AI workflows
Recommended Usage
MiniCPM5-1B performs best as a lightweight component in a larger AI system.
Agent Router
Classify user requests and route them to the appropriate model or tool.
Tool Caller
Generate structured tool calls before handing execution to external systems.
Local Coding Assistant
Provide fast code suggestions and lightweight programming support.
Intent Classifier
Handle command parsing, task classification, and workflow orchestration.
Edge AI
Deploy on:
- Mini PCs
- SBCs
- Embedded Systems
- Local Servers
- Privacy-Focused Environments
Using with llama.cpp
Interactive Chat
llama-cli \
-m MiniCPM5-1B-Q4_K_M.gguf
OpenAI-Compatible API Server
llama-server \
-m MiniCPM5-1B-Q4_K_M.gguf \
--host 0.0.0.0 \
--port 8080
API endpoint:
http://localhost:8080
Using with Ollama
1. Create Model Directory
mkdir MiniCPM5-Ollama
cd MiniCPM5-Ollama
Copy your GGUF file:
cp ../MiniCPM5-1B-Q4_K_M.gguf .
2. Create Modelfile
Create a file named:
Modelfile
Example:
FROM ./MiniCPM5-1B-Q4_K_M.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 8192
SYSTEM """
You are a helpful AI assistant.
"""
3. Build Ollama Model
ollama create minicpm5-1b -f Modelfile
Verify installation:
ollama list
Expected output:
NAME SIZE
minicpm5-1b 0.7 GB
4. Run Interactive Chat
ollama run minicpm5-1b
Example:
>>> Explain Rust ownership in simple terms.
5. Run with Custom Prompt
ollama run minicpm5-1b "Write a Python function to reverse a string."
6. Generate via API
Start Ollama:
ollama serve
Generate text:
curl http://localhost:11434/api/generate \
-d '{
"model":"minicpm5-1b",
"prompt":"Explain what an AI Agent is."
}'
7. Chat API
curl http://localhost:11434/api/chat \
-d '{
"model":"minicpm5-1b",
"messages":[
{
"role":"user",
"content":"Write a simple Rust web server."
}
]
}'
8. Python Example
import requests
response = requests.post(
"http://localhost:11434/api/generate",
json={
"model": "minicpm5-1b",
"prompt": "Explain tool calling."
}
)
print(response.json()["response"])
9. OpenAI SDK Compatibility
Ollama provides an OpenAI-compatible API.
pip install openai
Python example:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama"
)
response = client.chat.completions.create(
model="minicpm5-1b",
messages=[
{
"role": "user",
"content": "Explain AI agents."
}
]
)
print(response.choices[0].message.content)
This allows MiniCPM5-1B to work with:
- LangGraph
- CrewAI
- Flowise
- OpenHands
- OpenManus
- N8N
- Dify
- AutoGen
- Custom Agent Frameworks
without code changes.
10. Open WebUI
Start Open WebUI:
docker run -d \
-p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
ghcr.io/open-webui/open-webui:main
Open:
http://localhost:3000
Open WebUI automatically detects Ollama running on:
http://localhost:11434
Example Agent-Oriented Modelfile
FROM ./MiniCPM5-1B-Q4_K_M.gguf
PARAMETER temperature 0.2
PARAMETER top_p 0.9
PARAMETER num_ctx 8192
SYSTEM """
You are an AI agent specialized in:
- Task planning
- Tool selection
- Intent classification
- Workflow orchestration
Respond clearly and efficiently.
"""
Recommended Settings
AI Agents
temperature = 0.2
top_p = 0.9
num_ctx = 8192
Coding
temperature = 0.1
top_p = 0.95
num_ctx = 8192
General Chat
temperature = 0.7
top_p = 0.9
num_ctx = 8192
Hardware Requirements
Approximate memory requirements:
| Quant | Memory |
|---|---|
| IQ4_XS | ~0.5 GB |
| Q4_K_S | ~0.6 GB |
| Q4_K_M | ~0.7 GB |
| Q8_0 | ~1.1 GB |
| F16 | ~2.5 GB |
Recommended Quant
For most users:
MiniCPM5-1B-Q4_K_M.gguf
Provides the best balance between:
- Memory Usage
- Speed
- Quality
Acknowledgements
Original model:
OpenBMB - MiniCPM5-1B
GGUF conversion generated using:
- llama.cpp
- GGUF format
All credit for model architecture and training belongs to the original authors.
License
Please follow the original MiniCPM5 license and usage terms.
- Downloads last month
- 373
4-bit
8-bit
16-bit
Model tree for bombman/MiniCPM5-1B-GGUF
Base model
openbmb/MiniCPM5-1B