COinCO
/

Context_Classification_Models

Image-Text-to-Text

context-classification

out-of-context-detection

Model card Files Files and versions

Context_Classification_Models / README.md

ruitongs's picture

Upload README.md with huggingface_hub

69c10fa verified 9 days ago

|

history blame contribute delete

3.33 kB

	---
	base_model: Qwen/Qwen2.5-VL-3B-Instruct
	library_name: transformers
	pipeline_tag: image-text-to-text
	tags:
	- qwen2.5-vl
	- lora
	- sft
	- context-classification
	- out-of-context-detection
	- coinco
	license: cc-by-4.0
	---

	# COinCO Context Classification Models

	Authors: Tianze Yang\, Tyson Jordan\, Ruitong Sun\*, Ninghao Liu, Jin Sun
	\*Equal contribution
	Affiliation: University of Georgia

	## Overview

	Fine-grained context classification models for detecting out-of-context objects in images. Each model is a fully merged Qwen2.5-VL-3B-Instruct fine-tuned via LoRA on the [COinCO dataset](https://huggingface.co/datasets/COinCO/COinCO-dataset).

	The models classify whether an object (marked by a red bounding box) is in-context or out-of-context based on three criteria:

	\| Model \| Criterion \| Description \|
	\|-------\|-----------\|-------------\|
	\| `co_occurrence/` \| Co-occurrence \| Whether the object can reasonably appear together with other objects in the scene \|
	\| `location/` \| Location \| Whether the object is placed in a physically and contextually reasonable position \|
	\| `size/` \| Size \| Whether the object's size is proportional and realistic relative to other objects \|

	## How to Use

	```python
	from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
	import torch

	# Choose a model: "co_occurrence", "location", or "size"
	model_id = "COinCO/Context_Classification_Models"
	subfolder = "co_occurrence" # or "location" or "size"

	model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
	model_id,
	subfolder=subfolder,
	torch_dtype=torch.float16,
	device_map="auto",
	)
	processor = AutoProcessor.from_pretrained(model_id, subfolder=subfolder)
	```

	## Training Details

	- Base Model: [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct)
	- Method: LoRA fine-tuning (merged into base model)
	- Dataset: [COinCO](https://huggingface.co/datasets/COinCO/COinCO-dataset) inpainted images with multi-model consensus labels
	- Training Data: ~5,000 samples per criterion from the training split
	- Epochs: 3
	- Learning Rate: 2e-4
	- LoRA Rank: See adapter config for details

	## Evaluation Results

	### Inpainted Test Set (binary classification: In-context vs Out-of-context)

	\| Criterion \| Baseline (Qwen2.5-VL-3B) \| Fine-tuned \| Improvement \|
	\|-----------\|--------------------------\|------------\|-------------\|
	\| Co-occurrence \| 75.54% \| 80.82% \| +5.28% \|
	\| Location \| 74.43% \| 71.05% \| -3.38% \|
	\| Size \| 50.21% \| 66.01% \| +15.80% \|

	### Real COCO Images (shortcut learning detection, higher = less shortcut reliance)

	\| Criterion \| Baseline \| Fine-tuned \| Improvement \|
	\|-----------\|----------\|------------\|-------------\|
	\| Co-occurrence \| 88.95% \| 87.00% \| -1.95% \|
	\| Location \| 47.55% \| 91.35% \| +43.80% \|
	\| Size \| 52.55% \| 83.20% \| +30.65% \|

	## Related Resources

	- Paper: "Common Inpainted Objects In-N-Out of Context"
	- Dataset: [COinCO/COinCO-dataset](https://huggingface.co/datasets/COinCO/COinCO-dataset)
	- Code: [YangTianze009/COinCO](https://github.com/YangTianze009/COinCO)

	## Citation

	```bibtex
	@article{yang2025coinco,
	title={Common Inpainted Objects In-N-Out of Context},
	author={Tianze Yang and Tyson Jordan and Ruitong Sun and Ninghao Liu and Jin Sun},
	year={2025}
	}
	```