Instructions to use openbmb/MiniCPM5-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use openbmb/MiniCPM5-1B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="openbmb/MiniCPM5-1B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("openbmb/MiniCPM5-1B")
model = AutoModelForCausalLM.from_pretrained("openbmb/MiniCPM5-1B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use openbmb/MiniCPM5-1B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "openbmb/MiniCPM5-1B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/MiniCPM5-1B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/openbmb/MiniCPM5-1B

SGLang

How to use openbmb/MiniCPM5-1B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "openbmb/MiniCPM5-1B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/MiniCPM5-1B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "openbmb/MiniCPM5-1B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/MiniCPM5-1B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use openbmb/MiniCPM5-1B with Docker Model Runner:
```
docker model run hf.co/openbmb/MiniCPM5-1B
```

Hindi fine-tune of MiniCPM5-1B now available + GGUF quants

by pankajpandey-dev - opened 4 days ago

Discussion

pankajpandey-dev

4 days ago

Hi @openbmb team and community! 👋

Thanks for releasing MiniCPM5-1B — the tokenizer handles Devanagari beautifully (0.81 tokens/char on Hindi text) and the model is the perfect size for low-resource Indic adaptation.

I've released a Hindi instruction-tuned version trained on AI4Bharat's indic-instruct-data-v0.1 (anudesh + dolly Hindi splits, ~4k high-quality examples):

🔗 HF Model: https://huggingface.co/pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct
🔗 GGUF Quants (Q3_K_M, Q4_K_M, Q5_K_M, Q6_K, Q8_0): https://huggingface.co/pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct-v1-GGUF

Training stack: Unsloth + TRL + LoRA (r=32), 60 min on a single T4. Full details on the model card.

One note for the llama.cpp folks: the BPE pre-tokenizer hash isn't in llama.cpp's registry yet — I registered 36f3066e97b7f3994b379aaacde306c1444c6ae84e81a5ae3cd2b7ed3b8c42d4 → qwen2 as the closest match and conversion worked cleanly. Happy to submit a PR to llama.cpp upstream if this is the right pre-tokenizer family for MiniCPM5.

Looking forward to more Indic fine-tunes of this base — thanks again!

xcjthu

OpenBMB org 3 days ago

Hi Pankaj, thank you so much for the great work! 👏

We’re really excited to see MiniCPM5-1B adapted for Hindi instruction tuning, and the GGUF quants will be very helpful for the community.

Regarding the llama.cpp tokenizer / pre-tokenizer issue, we have already adapted a version for reference:

https://github.com/zhangtao2-1/llama.cpp/

Thanks again for the excellent contribution — looking forward to more fine-tuned variants built on MiniCPM5! 🚀

TreeLoys

2 days ago

Can you train on russian language?

pankajpandey-dev

1 day ago

Can you train on russian language?

I haven’t worked with Russian datasets personally yet, but it should definitely be possible to fine-tune MiniCPM5-1B for Russian as well.

The main challenge for me would be evaluation and alignment quality since I don’t know Russian. If members of the community are interested in collaborating on datasets, evaluation, or benchmarking, I’d be very happy to help with the training side 🙂

TreeLoys

1 day ago

I have none idea how to training it. All 1B models have so bad optimization on russian and other languages (1B model optimize only is English) end this fact not to do use small model.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment