| --- |
| base_model: Qwen/Qwen2.5-VL-3B-Instruct |
| library_name: transformers |
| pipeline_tag: image-text-to-text |
| tags: |
| - qwen2.5-vl |
| - lora |
| - sft |
| - context-classification |
| - out-of-context-detection |
| - coinco |
| license: cc-by-4.0 |
| --- |
| |
| # COinCO Context Classification Models |
|
|
| **Authors:** Tianze Yang\*, Tyson Jordan\*, Ruitong Sun\*, Ninghao Liu, Jin Sun |
| \*Equal contribution |
| **Affiliation:** University of Georgia |
|
|
| ## Overview |
|
|
| Fine-grained context classification models for detecting **out-of-context objects** in images. Each model is a fully merged Qwen2.5-VL-3B-Instruct fine-tuned via LoRA on the [COinCO dataset](https://huggingface.co/datasets/COinCO/COinCO-dataset). |
|
|
| The models classify whether an object (marked by a red bounding box) is **in-context** or **out-of-context** based on three criteria: |
|
|
| | Model | Criterion | Description | |
| |-------|-----------|-------------| |
| | `co_occurrence/` | Co-occurrence | Whether the object can reasonably appear together with other objects in the scene | |
| | `location/` | Location | Whether the object is placed in a physically and contextually reasonable position | |
| | `size/` | Size | Whether the object's size is proportional and realistic relative to other objects | |
|
|
| ## How to Use |
|
|
| ```python |
| from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor |
| import torch |
| |
| # Choose a model: "co_occurrence", "location", or "size" |
| model_id = "COinCO/Context_Classification_Models" |
| subfolder = "co_occurrence" # or "location" or "size" |
| |
| model = Qwen2_5_VLForConditionalGeneration.from_pretrained( |
| model_id, |
| subfolder=subfolder, |
| torch_dtype=torch.float16, |
| device_map="auto", |
| ) |
| processor = AutoProcessor.from_pretrained(model_id, subfolder=subfolder) |
| ``` |
|
|
| ## Training Details |
|
|
| - **Base Model:** [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) |
| - **Method:** LoRA fine-tuning (merged into base model) |
| - **Dataset:** [COinCO](https://huggingface.co/datasets/COinCO/COinCO-dataset) inpainted images with multi-model consensus labels |
| - **Training Data:** ~5,000 samples per criterion from the training split |
| - **Epochs:** 3 |
| - **Learning Rate:** 2e-4 |
| - **LoRA Rank:** See adapter config for details |
|
|
| ## Evaluation Results |
|
|
| ### Inpainted Test Set (binary classification: In-context vs Out-of-context) |
|
|
| | Criterion | Baseline (Qwen2.5-VL-3B) | Fine-tuned | Improvement | |
| |-----------|--------------------------|------------|-------------| |
| | Co-occurrence | 75.54% | **80.82%** | +5.28% | |
| | Location | 74.43% | 71.05% | -3.38% | |
| | Size | 50.21% | **66.01%** | +15.80% | |
|
|
| ### Real COCO Images (shortcut learning detection, higher = less shortcut reliance) |
|
|
| | Criterion | Baseline | Fine-tuned | Improvement | |
| |-----------|----------|------------|-------------| |
| | Co-occurrence | 88.95% | 87.00% | -1.95% | |
| | Location | 47.55% | **91.35%** | +43.80% | |
| | Size | 52.55% | **83.20%** | +30.65% | |
|
|
| ## Related Resources |
|
|
| - **Paper:** "Common Inpainted Objects In-N-Out of Context" |
| - **Dataset:** [COinCO/COinCO-dataset](https://huggingface.co/datasets/COinCO/COinCO-dataset) |
| - **Code:** [YangTianze009/COinCO](https://github.com/YangTianze009/COinCO) |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{yang2025coinco, |
| title={Common Inpainted Objects In-N-Out of Context}, |
| author={Tianze Yang and Tyson Jordan and Ruitong Sun and Ninghao Liu and Jin Sun}, |
| year={2025} |
| } |
| ``` |
|
|