Instructions to use tencent/HunyuanOCR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use tencent/HunyuanOCR with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="tencent/HunyuanOCR")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained("tencent/HunyuanOCR", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use tencent/HunyuanOCR with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "tencent/HunyuanOCR"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tencent/HunyuanOCR",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/tencent/HunyuanOCR

SGLang

How to use tencent/HunyuanOCR with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "tencent/HunyuanOCR" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tencent/HunyuanOCR",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "tencent/HunyuanOCR" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tencent/HunyuanOCR",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use tencent/HunyuanOCR with Docker Model Runner:
```
docker model run hf.co/tencent/HunyuanOCR
```

run with api is too slow

#13

by medisean - opened Nov 27, 2025

Discussion

medisean

Nov 27, 2025

use fastapi run with transformers too slow ( about 10s+), I have a GPU, how to solve the problem, my code is:

from fastapi import FastAPI, File, UploadFile
from fastapi.responses import JSONResponse
from transformers import AutoProcessor, HunYuanVLForConditionalGeneration
from PIL import Image
import torch
import io
import os

os.environ["CUDA_VISIBLE_DEVICES"] = "1"

app = FastAPI(title="HunYuan OCR API")

def clean_repeated_substrings(text):
"""Clean repeated substrings in text"""
n = len(text)
if n < 8000:
return text
for length in range(2, n // 10 + 1):
candidate = text[-length:]
count = 0
i = n - length

    while i >= 0 and text[i:i + length] == candidate:
        count += 1
        i -= length

    if count >= 10:
        return text[:n - length * (count - 1)]

return text

---------- Load model & processor once (global) ----------

model_name = "/models/HunyuanOCR"

processor = AutoProcessor.from_pretrained(model_name, use_fast=False)
model = HunYuanVLForConditionalGeneration.from_pretrained(
model_name,
attn_implementation="eager",
dtype=torch.bfloat16,
device_map="auto"
)

device = next(model.parameters()).device

-------------------- API Route --------------------

@app .post("/ocr")
async def ocr_api(file: UploadFile = File(...)):
try:
# Read image bytes
img_bytes = await file.read()
image = Image.open(io.BytesIO(img_bytes))

    # Construct messages
    messages1 = [
        {"role": "system", "content": ""},
        {
            "role": "user",
            "content": [
                {"type": "image", "image": "uploaded_image"},
                {"type": "text", "text": "检测并识别图片中的文字，将文本坐标格式化输出。"},
            ],
        },
    ]
    messages = [messages1]

    texts = [
        processor.apply_chat_template(msg, tokenize=False, add_generation_prompt=True)
        for msg in messages
    ]

    # Prepare inputs
    inputs = processor(
        text=texts,
        images=image,
        padding=True,
        return_tensors="pt",
    ).to(device)

    # Generate
    with torch.no_grad():
        generated_ids = model.generate(
            **inputs, max_new_tokens=1024, do_sample=False
        )

    input_ids = inputs.input_ids
    generated_ids_trimmed = [
        out_ids[len(in_ids):] for in_ids, out_ids in zip(input_ids, generated_ids)
    ]

    output_text = processor.batch_decode(
        generated_ids_trimmed,
        skip_special_tokens=True,
        clean_up_tokenization_spaces=False
    )

    output_text = clean_repeated_substrings(output_text)

    return JSONResponse({"text": output_text})

except Exception as e:
    return JSONResponse({"error": str(e)}, status_code=500)

AmandaZ

Dec 8, 2025

Hi, check this out for hunyuanOCR serving by inference provider. https://console.gmicloud.ai/playground/llm/hunyuanocr/3de77397-542f-49d5-830b-4a6c73811f88

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment