Instructions to use prithivMLmods/visionOCR-3B-061125 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use prithivMLmods/visionOCR-3B-061125 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="prithivMLmods/visionOCR-3B-061125") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("prithivMLmods/visionOCR-3B-061125") model = AutoModelForImageTextToText.from_pretrained("prithivMLmods/visionOCR-3B-061125") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use prithivMLmods/visionOCR-3B-061125 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "prithivMLmods/visionOCR-3B-061125" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/visionOCR-3B-061125", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/prithivMLmods/visionOCR-3B-061125
- SGLang
How to use prithivMLmods/visionOCR-3B-061125 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "prithivMLmods/visionOCR-3B-061125" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/visionOCR-3B-061125", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "prithivMLmods/visionOCR-3B-061125" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/visionOCR-3B-061125", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use prithivMLmods/visionOCR-3B-061125 with Docker Model Runner:
docker model run hf.co/prithivMLmods/visionOCR-3B-061125
Org Chart Hierarchy
I have a use case where I need to retrieve information from an org chart and output it in JSON while maintaining the correct hierarchy.
I was using Mistral 3.1 Small (24B), but it’s 17GB in size, and its accuracy averages 80-90%. I’m considering switching to a smaller model and fine-tuning it—this way, the size would decrease, and accuracy might improve after fine-tuning.
Please help me here. If I’m wrong in any way, feel free to suggest alternatives.
You're right in thinking about switching to a smaller model and fine-tuning it for your specific task. @okayatul
If your main goal is to extract hierarchical relationships and convert them into structured JSON, you might not need a 24B model. A smaller model (like 3B–7B) fine-tuned specifically on your type of org chart data (tasks) could actually perform better than a larger general-purpose model, especially if the domain is narrow or repetitive.
Try using Qwen2-VL or Qwen2.5-VL (e.g., Nanonets, Monkey OCR, etc.). Test them all and find the one that best suits your needs.
Thanks for the reply. However, the issue here is with the dataset—this type of data, specifically organizational charts, isn't available anywhere as far as I can see. So, please guide me on what to do and how to proceed.
Secondly, I have also tried the OCR tools you mentioned, but they didn't return any results when I provided an org chart as input.