Instructions to use openbmb/MiniCPM5-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openbmb/MiniCPM5-1B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="openbmb/MiniCPM5-1B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("openbmb/MiniCPM5-1B") model = AutoModelForCausalLM.from_pretrained("openbmb/MiniCPM5-1B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use openbmb/MiniCPM5-1B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "openbmb/MiniCPM5-1B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM5-1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/openbmb/MiniCPM5-1B
- SGLang
How to use openbmb/MiniCPM5-1B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "openbmb/MiniCPM5-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM5-1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "openbmb/MiniCPM5-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM5-1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use openbmb/MiniCPM5-1B with Docker Model Runner:
docker model run hf.co/openbmb/MiniCPM5-1B
Hindi fine-tune of MiniCPM5-1B now available + GGUF quants
Hi @openbmb team and community! π
Thanks for releasing MiniCPM5-1B β the tokenizer handles Devanagari beautifully (0.81 tokens/char on Hindi text) and the model is the perfect size for low-resource Indic adaptation.
I've released a Hindi instruction-tuned version trained on AI4Bharat's indic-instruct-data-v0.1 (anudesh + dolly Hindi splits, ~4k high-quality examples):
π HF Model: https://huggingface.co/pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct
π GGUF Quants (Q3_K_M, Q4_K_M, Q5_K_M, Q6_K, Q8_0): https://huggingface.co/pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct-v1-GGUF
Training stack: Unsloth + TRL + LoRA (r=32), 60 min on a single T4. Full details on the model card.
One note for the llama.cpp folks: the BPE pre-tokenizer hash isn't in llama.cpp's registry yet β I registered 36f3066e97b7f3994b379aaacde306c1444c6ae84e81a5ae3cd2b7ed3b8c42d4 β qwen2 as the closest match and conversion worked cleanly. Happy to submit a PR to llama.cpp upstream if this is the right pre-tokenizer family for MiniCPM5.
Looking forward to more Indic fine-tunes of this base β thanks again!
Hi Pankaj, thank you so much for the great work! π
Weβre really excited to see MiniCPM5-1B adapted for Hindi instruction tuning, and the GGUF quants will be very helpful for the community.
Regarding the llama.cpp tokenizer / pre-tokenizer issue, we have already adapted a version for reference:
https://github.com/zhangtao2-1/llama.cpp/
Thanks again for the excellent contribution β looking forward to more fine-tuned variants built on MiniCPM5! π
Can you train on russian language?
Can you train on russian language?
I havenβt worked with Russian datasets personally yet, but it should definitely be possible to fine-tune MiniCPM5-1B for Russian as well.
The main challenge for me would be evaluation and alignment quality since I donβt know Russian. If members of the community are interested in collaborating on datasets, evaluation, or benchmarking, Iβd be very happy to help with the training side π
I have none idea how to training it. All 1B models have so bad optimization on russian and other languages (1B model optimize only is English) end this fact not to do use small model.