preethamvj
/

chart-vision-qwen

@@ -1,14 +1,130 @@
 ---
-license: mit
-datasets:
-- HuggingFaceM4/ChartQA
-language:
-- en
-metrics:
-- accuracy
-base_model:
-- Qwen/Qwen2-VL-2B-Instruct
-library_name: transformers
-tags:
-- code
----

+# chart-vision-qwen
+**Qwen2-VL-2B-Instruct fine-tuned on ChartQA** using LoRA adapters for chart question answering.
+*This is Submitted as a deliverable for Orange Problem, for NLP with DL course (UE23AM343BB1)
+**Team:** Langrangers (PES University)
+- Aaron Thomas Mathew — PES1UG23AM005
+- Aman Kumar Mishra — PES1UG23AM040
+- Preetham VJ — PES1UG23AM913
+**GitHub Repository:** [Aman-K-Mishra/orange-chartqa-slm](https://github.com/Aman-K-Mishra/orange-chartqa-slm)
 ---
+## Model Description
+Given a chart image (bar chart, line chart, pie chart, etc.) and a natural language question, this model predicts the answer. It was fine-tuned from [Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) using LoRA (Low-Rank Adaptation) on the full ChartQA training split (28,299 samples).
+| Property | Value |
+|---|---|
+| Base model | Qwen2-VL-2B-Instruct |
+| Fine-tuning method | LoRA (PEFT) |
+| Dataset | HuggingFaceM4/ChartQA |
+| Training samples | 28,299 |
+| Trainable parameters | 4.36M (0.20% of 2.21B) |
+| Hardware | Tesla T4 (15.6 GB VRAM) |
+| Epochs | 1 |
+---
+## Training Details
+### LoRA Configuration
+| Parameter | Value | Reason |
+|---|---|---|
+| Rank (`r`) | 16 | rank=8 too small for chart reasoning; rank=32 risks OOM |
+| Alpha | 32 | Standard `alpha = 2×rank` rule |
+| Dropout | 0.05 | Light regularisation to prevent adapter overfitting |
+| Target modules | q_proj, k_proj, v_proj, o_proj | Most impactful attention projections for VLMs |
+### Training Hyperparameters
+| Parameter | Value | Reason |
+|---|---|---|
+| Batch size | 1 | OOM fix for T4 with max_length=768 |
+| Gradient accumulation | 16 steps | Effective batch = 16 |
+| Learning rate | 2e-4 | Standard for LoRA fine-tuning |
+| Max sequence length | 768 | Compromise: 512 too short, 1024 causes OOM |
+| Quantization | 8-bit (BitsAndBytes) | Full precision ≈ 16 GB; 8-bit ≈ 8 GB, safe for T4 |
+| Image resolution | 256–512 patches (28×28) | Matches Qwen2-VL patch size; T4-safe |
+| LR scheduler | Cosine annealing | Smooth decay over full epoch |
+---
+## How to Use
+### Installation
+```bash
+pip install transformers peft bitsandbytes accelerate datasets pillow
+```
+### Load Adapters and Run Inference
+```python
+from transformers import AutoProcessor, Qwen2VLForConditionalGeneration, BitsAndBytesConfig
+from peft import PeftModel
+from PIL import Image
+import torch
+BASE_MODEL_ID = "Qwen/Qwen2-VL-2B-Instruct"
+ADAPTER_REPO  = "preethamvj/chart-vision-qwen"
+# Load base model in 8-bit
+bnb_config = BitsAndBytesConfig(load_in_8bit=True)
+model = Qwen2VLForConditionalGeneration.from_pretrained(
+    BASE_MODEL_ID,
+    quantization_config=bnb_config,
+    device_map="auto",
+    torch_dtype=torch.float16
+)
+# Load LoRA adapters
+model = PeftModel.from_pretrained(model, ADAPTER_REPO)
+# Optional: merge adapters into base weights for faster inference
+model = model.merge_and_unload()
+processor = AutoProcessor.from_pretrained(
+    BASE_MODEL_ID,
+    min_pixels=256 * 28 * 28,
+    max_pixels=512 * 28 * 28
+)
+# Inference
+image = Image.open("your_chart.png").convert("RGB")
+question = "What is the highest value in the chart?"
+messages = [{
+    "role": "user",
+    "content": [
+        {"type": "image", "image": image},
+        {"type": "text",  "text": question}
+    ]
+}]
+text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = processor(text=[text], images=[image], return_tensors="pt").to("cuda")
+with torch.no_grad():
+    output = model.generate(**inputs, max_new_tokens=64)
+answer = processor.decode(output[0], skip_special_tokens=True)
+print("Answer:", answer.split("assistant")[-1].strip())
+```
+---
+## Intended Use
+This model is intended for chart question answering tasks — reading values, trends, comparisons, and facts from chart images. It is not designed for general visual question answering outside the chart domain.
+---
+## Limitations
+- Trained for only 1 epoch due to compute constraints (T4 GPU)
+- Loss shows high variance across steps, suggesting the learning rate may benefit from tuning in future runs
+- Performance may degrade on chart types not well-represented in ChartQA (e.g., highly complex infographics)