Instructions to use prithivMLmods/visionOCR-3B-061125 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prithivMLmods/visionOCR-3B-061125 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="prithivMLmods/visionOCR-3B-061125")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("prithivMLmods/visionOCR-3B-061125")
model = AutoModelForImageTextToText.from_pretrained("prithivMLmods/visionOCR-3B-061125")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use prithivMLmods/visionOCR-3B-061125 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prithivMLmods/visionOCR-3B-061125"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/visionOCR-3B-061125",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/prithivMLmods/visionOCR-3B-061125

SGLang

How to use prithivMLmods/visionOCR-3B-061125 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prithivMLmods/visionOCR-3B-061125" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/visionOCR-3B-061125",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prithivMLmods/visionOCR-3B-061125" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/visionOCR-3B-061125",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use prithivMLmods/visionOCR-3B-061125 with Docker Model Runner:
```
docker model run hf.co/prithivMLmods/visionOCR-3B-061125
```

Org Chart Hierarchy

by okayatul - opened Jul 9, 2025

Discussion

okayatul

Jul 9, 2025

I have a use case where I need to retrieve information from an org chart and output it in JSON while maintaining the correct hierarchy.

I was using Mistral 3.1 Small (24B), but it’s 17GB in size, and its accuracy averages 80-90%. I’m considering switching to a smaller model and fine-tuning it—this way, the size would decrease, and accuracy might improve after fine-tuning.

Please help me here. If I’m wrong in any way, feel free to suggest alternatives.

prithivMLmods

Owner Jul 9, 2025

You're right in thinking about switching to a smaller model and fine-tuning it for your specific task. @okayatul

If your main goal is to extract hierarchical relationships and convert them into structured JSON, you might not need a 24B model. A smaller model (like 3B–7B) fine-tuned specifically on your type of org chart data (tasks) could actually perform better than a larger general-purpose model, especially if the domain is narrow or repetitive.

Try using Qwen2-VL or Qwen2.5-VL (e.g., Nanonets, Monkey OCR, etc.). Test them all and find the one that best suits your needs.

prithivMLmods changed discussion status to closed Jul 9, 2025

AtulOk

Jul 10, 2025

Thanks for the reply. However, the issue here is with the dataset—this type of data, specifically organizational charts, isn't available anywhere as far as I can see. So, please guide me on what to do and how to proceed.

Secondly, I have also tried the OCR tools you mentioned, but they didn't return any results when I provided an org chart as input.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment