Instructions to use preethamvj/chart-vision-qwen with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use preethamvj/chart-vision-qwen with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-VL-2B-Instruct") model = PeftModel.from_pretrained(base_model, "preethamvj/chart-vision-qwen") - Notebooks
- Google Colab
- Kaggle
Update README.md
#1
by aaronmat1905 - opened
README.md
CHANGED
|
@@ -1,14 +1,130 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
language:
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
# chart-vision-qwen
|
| 3 |
+
|
| 4 |
+
**Qwen2-VL-2B-Instruct fine-tuned on ChartQA** using LoRA adapters for chart question answering.
|
| 5 |
+
|
| 6 |
+
*This is Submitted as a deliverable for Orange Problem, for NLP with DL course (UE23AM343BB1)
|
| 7 |
+
|
| 8 |
+
**Team:** Langrangers (PES University)
|
| 9 |
+
- Aaron Thomas Mathew — PES1UG23AM005
|
| 10 |
+
- Aman Kumar Mishra — PES1UG23AM040
|
| 11 |
+
- Preetham VJ — PES1UG23AM913
|
| 12 |
+
|
| 13 |
+
**GitHub Repository:** [Aman-K-Mishra/orange-chartqa-slm](https://github.com/Aman-K-Mishra/orange-chartqa-slm)
|
| 14 |
+
|
| 15 |
---
|
| 16 |
+
|
| 17 |
+
## Model Description
|
| 18 |
+
|
| 19 |
+
Given a chart image (bar chart, line chart, pie chart, etc.) and a natural language question, this model predicts the answer. It was fine-tuned from [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) using LoRA (Low-Rank Adaptation) on the full ChartQA training split (28,299 samples).
|
| 20 |
+
|
| 21 |
+
| Property | Value |
|
| 22 |
+
|---|---|
|
| 23 |
+
| Base model | Qwen2-VL-2B-Instruct |
|
| 24 |
+
| Fine-tuning method | LoRA (PEFT) |
|
| 25 |
+
| Dataset | HuggingFaceM4/ChartQA |
|
| 26 |
+
| Training samples | 28,299 |
|
| 27 |
+
| Trainable parameters | 4.36M (0.20% of 2.21B) |
|
| 28 |
+
| Hardware | Tesla T4 (15.6 GB VRAM) |
|
| 29 |
+
| Epochs | 1 |
|
| 30 |
+
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
## Training Details
|
| 34 |
+
|
| 35 |
+
### LoRA Configuration
|
| 36 |
+
| Parameter | Value | Reason |
|
| 37 |
+
|---|---|---|
|
| 38 |
+
| Rank (`r`) | 16 | rank=8 too small for chart reasoning; rank=32 risks OOM |
|
| 39 |
+
| Alpha | 32 | Standard `alpha = 2×rank` rule |
|
| 40 |
+
| Dropout | 0.05 | Light regularisation to prevent adapter overfitting |
|
| 41 |
+
| Target modules | q_proj, k_proj, v_proj, o_proj | Most impactful attention projections for VLMs |
|
| 42 |
+
|
| 43 |
+
### Training Hyperparameters
|
| 44 |
+
| Parameter | Value | Reason |
|
| 45 |
+
|---|---|---|
|
| 46 |
+
| Batch size | 1 | OOM fix for T4 with max_length=768 |
|
| 47 |
+
| Gradient accumulation | 16 steps | Effective batch = 16 |
|
| 48 |
+
| Learning rate | 2e-4 | Standard for LoRA fine-tuning |
|
| 49 |
+
| Max sequence length | 768 | Compromise: 512 too short, 1024 causes OOM |
|
| 50 |
+
| Quantization | 8-bit (BitsAndBytes) | Full precision ≈ 16 GB; 8-bit ≈ 8 GB, safe for T4 |
|
| 51 |
+
| Image resolution | 256–512 patches (28×28) | Matches Qwen2-VL patch size; T4-safe |
|
| 52 |
+
| LR scheduler | Cosine annealing | Smooth decay over full epoch |
|
| 53 |
+
|
| 54 |
+
---
|
| 55 |
+
|
| 56 |
+
## How to Use
|
| 57 |
+
|
| 58 |
+
### Installation
|
| 59 |
+
|
| 60 |
+
```bash
|
| 61 |
+
pip install transformers peft bitsandbytes accelerate datasets pillow
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
### Load Adapters and Run Inference
|
| 65 |
+
|
| 66 |
+
```python
|
| 67 |
+
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration, BitsAndBytesConfig
|
| 68 |
+
from peft import PeftModel
|
| 69 |
+
from PIL import Image
|
| 70 |
+
import torch
|
| 71 |
+
|
| 72 |
+
BASE_MODEL_ID = "Qwen/Qwen2-VL-2B-Instruct"
|
| 73 |
+
ADAPTER_REPO = "preethamvj/chart-vision-qwen"
|
| 74 |
+
|
| 75 |
+
# Load base model in 8-bit
|
| 76 |
+
bnb_config = BitsAndBytesConfig(load_in_8bit=True)
|
| 77 |
+
model = Qwen2VLForConditionalGeneration.from_pretrained(
|
| 78 |
+
BASE_MODEL_ID,
|
| 79 |
+
quantization_config=bnb_config,
|
| 80 |
+
device_map="auto",
|
| 81 |
+
torch_dtype=torch.float16
|
| 82 |
+
)
|
| 83 |
+
|
| 84 |
+
# Load LoRA adapters
|
| 85 |
+
model = PeftModel.from_pretrained(model, ADAPTER_REPO)
|
| 86 |
+
|
| 87 |
+
# Optional: merge adapters into base weights for faster inference
|
| 88 |
+
model = model.merge_and_unload()
|
| 89 |
+
|
| 90 |
+
processor = AutoProcessor.from_pretrained(
|
| 91 |
+
BASE_MODEL_ID,
|
| 92 |
+
min_pixels=256 * 28 * 28,
|
| 93 |
+
max_pixels=512 * 28 * 28
|
| 94 |
+
)
|
| 95 |
+
|
| 96 |
+
# Inference
|
| 97 |
+
image = Image.open("your_chart.png").convert("RGB")
|
| 98 |
+
question = "What is the highest value in the chart?"
|
| 99 |
+
|
| 100 |
+
messages = [{
|
| 101 |
+
"role": "user",
|
| 102 |
+
"content": [
|
| 103 |
+
{"type": "image", "image": image},
|
| 104 |
+
{"type": "text", "text": question}
|
| 105 |
+
]
|
| 106 |
+
}]
|
| 107 |
+
|
| 108 |
+
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| 109 |
+
inputs = processor(text=[text], images=[image], return_tensors="pt").to("cuda")
|
| 110 |
+
|
| 111 |
+
with torch.no_grad():
|
| 112 |
+
output = model.generate(**inputs, max_new_tokens=64)
|
| 113 |
+
|
| 114 |
+
answer = processor.decode(output[0], skip_special_tokens=True)
|
| 115 |
+
print("Answer:", answer.split("assistant")[-1].strip())
|
| 116 |
+
```
|
| 117 |
+
|
| 118 |
+
---
|
| 119 |
+
|
| 120 |
+
## Intended Use
|
| 121 |
+
|
| 122 |
+
This model is intended for chart question answering tasks — reading values, trends, comparisons, and facts from chart images. It is not designed for general visual question answering outside the chart domain.
|
| 123 |
+
|
| 124 |
+
---
|
| 125 |
+
|
| 126 |
+
## Limitations
|
| 127 |
+
|
| 128 |
+
- Trained for only 1 epoch due to compute constraints (T4 GPU)
|
| 129 |
+
- Loss shows high variance across steps, suggesting the learning rate may benefit from tuning in future runs
|
| 130 |
+
- Performance may degrade on chart types not well-represented in ChartQA (e.g., highly complex infographics)
|