Instructions to use blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen3-vl-2b-instruct-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only") - Transformers
How to use blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only
- SGLang
How to use blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Unsloth Studio new
How to use blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only", max_seq_length=2048, ) - Docker Model Runner
How to use blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only with Docker Model Runner:
docker model run hf.co/blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only
PDF-OCR-RL: Qwen3-VL-2B SFT Only (Intermediate Checkpoint)
Fine-tuned Qwen3-VL-2B-Instruct for PDF-to-markdown conversion using Supervised Fine-Tuning (SFT) only.
This is the intermediate SFT checkpoint before GRPO refinement. The best model with GRPO is available at blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-grpo.
This is a LoRA adapter (r=32, alpha=64, 2.18% trainable parameters). Load it on top of the base model using PEFT.
Training Details
This checkpoint was produced by Stage 1 of the two-stage training pipeline.
SFT Training (100 steps)
Teaches the model the image-to-markdown mapping using supervised examples from rendered PDF pages paired with their source markdown.
| Parameter | Value |
|---|---|
| Learning rate | 2e-5 |
| Batch size | 2 |
| Max steps | 100 |
| Framework | Unsloth + TRL SFTTrainer |
| Loss curve | 1.295 → 0.78 |
| Grad norm | ~1.85 |
Technical Details
| Detail | Value |
|---|---|
| Base model | unsloth/Qwen3-VL-2B-Instruct (2.15B params) |
| LoRA rank (r) | 32 |
| LoRA alpha | 64 |
| LoRA dropout | 0.0 |
| Target modules | All linear layers |
| Trainable parameters | 2.18% of total |
| Precision | bf16 |
| Hardware | NVIDIA A40 48GB (RunPod) |
| Dataset | 500 train samples from blazeofchi/pdf-ocr-rl-dataset |
Purpose
This checkpoint serves two purposes:
Intermediate checkpoint for GRPO: This is the starting point for Stage 2 (GRPO refinement). GRPO alone produces near-zero gradients on vision-language models — SFT warm-up is essential.
SFT-only baseline: Compare against the full SFT+GRPO model to measure the contribution of GRPO refinement.
Usage
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
from qwen_vl_utils import process_vision_info
from PIL import Image
import torch
base_model = Qwen3VLForConditionalGeneration.from_pretrained(
"unsloth/Qwen3-VL-2B-Instruct",
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(
base_model, "blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only"
)
processor = AutoProcessor.from_pretrained(
"blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only"
)
image = Image.open("page.png")
messages = [
{"role": "user", "content": [
{"type": "image", "image": image},
{"type": "text", "text": "Convert this PDF page to well-structured markdown."}
]}
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text], images=image_inputs,
padding=True, return_tensors="pt"
).to(model.device)
output = model.generate(**inputs, max_new_tokens=2048, do_sample=False)
result = processor.decode(
output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True
)
print(result)
Related Models
| Model | Description |
|---|---|
| blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-grpo | SFT + GRPO (best model) |
| blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only | SFT-only (this model) |
Citation
@misc{pdf-ocr-rl-2026,
title={PDF-OCR-RL: Fine-tuning Vision-Language Models for PDF-to-Markdown with GRPO},
author={Paras Sharma},
year={2026},
url={https://github.com/Parassharmaa/pdf-ocr-rl}
}
License
Apache 2.0 (same as base model)
- Downloads last month
- 3
Model tree for blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only
Base model
Qwen/Qwen3-VL-2B-InstructDataset used to train blazeofchi/pdf-ocr-rl-qwen3vl2b-sft-only
Evaluation results
- Heading F1 on pdf-ocr-rl-datasettest set self-reported0.852
- Word F1 on pdf-ocr-rl-datasettest set self-reported0.720
- Edit Distance (Levenshtein) on pdf-ocr-rl-datasettest set self-reported0.745