Instructions to use google/paligemma2-3b-mix-224 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use google/paligemma2-3b-mix-224 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="google/paligemma2-3b-mix-224")

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("google/paligemma2-3b-mix-224")
model = AutoModelForImageTextToText.from_pretrained("google/paligemma2-3b-mix-224")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use google/paligemma2-3b-mix-224 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "google/paligemma2-3b-mix-224"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/paligemma2-3b-mix-224",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/google/paligemma2-3b-mix-224

SGLang

How to use google/paligemma2-3b-mix-224 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "google/paligemma2-3b-mix-224" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/paligemma2-3b-mix-224",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "google/paligemma2-3b-mix-224" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/paligemma2-3b-mix-224",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use google/paligemma2-3b-mix-224 with Docker Model Runner:
```
docker model run hf.co/google/paligemma2-3b-mix-224
```

内存不够，删除进入磁盘

by virhkx01 - opened May 15, 2025

Discussion

virhkx01

May 15, 2025

我现在的代码乱七八糟的，已经在pip下载了该有的模型，因为是windos系统，在pycharm终端运行python pali,py就会下载，modle-00001-of-00002.safetensor
我也进行了官网的下载（先在官网下载，终端下载太慢），用模拟器显示的是安装失败从新下载，之后转化为用pycharm格式，modle-00001-of-00002.safetensor显示的是内存太大，只能可读模式现在进行了删除。之后我python pali,py进行下载。完成后我看了看文件夹并没有把modle-00001-of-00002.safetensor
当我运行的时候显示的是内存不足转为磁盘，但是，终端一直不动

lkv

Google org Jul 30, 2025

Hi , Sorry for the delay. Download the complete model files (model-00001-of-00002.safetensor and model-00002-of-00002.safetensor) and keep them in the same folder.

If official website downloads are slow or incomplete, try downloading via the Hugging Face Hub CLI (huggingface-cli) which supports resumable downloads. Kindly try and let us know if you have any concerns will asisst you.

Thank you.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment