Instructions to use openbmb/MiniCPM-Llama3-V-2_5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openbmb/MiniCPM-Llama3-V-2_5 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="openbmb/MiniCPM-Llama3-V-2_5", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("openbmb/MiniCPM-Llama3-V-2_5", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use openbmb/MiniCPM-Llama3-V-2_5 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "openbmb/MiniCPM-Llama3-V-2_5" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM-Llama3-V-2_5", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/openbmb/MiniCPM-Llama3-V-2_5
- SGLang
How to use openbmb/MiniCPM-Llama3-V-2_5 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "openbmb/MiniCPM-Llama3-V-2_5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM-Llama3-V-2_5", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "openbmb/MiniCPM-Llama3-V-2_5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM-Llama3-V-2_5", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use openbmb/MiniCPM-Llama3-V-2_5 with Docker Model Runner:
docker model run hf.co/openbmb/MiniCPM-Llama3-V-2_5
gguf / llama.cpp support
Is there a chance you will finish your PR on llama.cpp ? (https://github.com/ggerganov/llama.cpp/pull/7599)
It looks like 90% is done but if a few more months are passing then the PR will become useless, llama.cpp is evolving and changing so any stale PR is destined to be lost.
On the positive side: once it is integrated the developer team will keep it maintained and working.
MiniCPM is such a good model, it would be the strongest visual model on llama.cpp (including ollama and other wrappers) - given the work and effort already spent I really hope to see a completion of the PR so all of that does not go to waste.
Hi, Thank you for your attention, and I hope my reply didn't keep you waiting for too long.
Yes, we will finish it in this week.
We have always aimed to create open-source models that can help more people, and the integration of gguf/llama.cpp is a key part of it. We did notice that the current version merged into the main branch has inconsistencies in the tokenization process compared to the older version in our fork. We have now asked @tc-mb to dedicate his full efforts in the coming days to truly complete this PR. We hope you understand that our manpower is limited, and we aim to deliver a PR this week that has undergone comprehensive precision testing and can be directly merged!
That's great to hear! I really hope it works out