AWARES-Qwen2.5-VL-7B

Paper | Code

AWARES (Adaptive Resolution with Active REqueStingS) is a fine-tuned Qwen2.5-VL-7B-Instruct model trained with GRPO (Group Relative Policy Optimization) to intelligently request high-resolution image crops when needed to answer visual questions.

How It Works

Given a low-resolution image and a question, the model decides whether it needs more visual detail. If so, it emits a GET_CROPS tool call specifying which region(s) to zoom into. The high-res crops are then provided back, and the model produces its final answer.

Crop regions — the model can request any of 9 predefined crop indices:

CROPS_MAP = {
    '0': 'top-left', '1': 'top-right', '2': 'bottom-left', '3': 'bottom-right',
    '4': 'center', '5': 'top', '6': 'bottom', '7': 'left', '8': 'right',
}

Quick Start

from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Kimhi/AWARES-Qwen2.5-VL-7B", torch_dtype="auto", device_map="auto"
)
processor = AutoProcessor.from_pretrained("Kimhi/AWARES-Qwen2.5-VL-7B")

For full usage examples (multi-turn inference with GET_CROPS, crop extraction, etc.) and evaluation scripts, see the AwaRes repository.

Evaluation

We provide a custom lmms-eval model type (qwen2_5_vl_awares) that handles the full AWARES multi-turn pipeline automatically — including low-res input, GET_CROPS parsing, crop extraction, and second-turn generation with KV-cache reuse.

See the evaluation instructions in the AwaRes repository for setup and benchmarking details.

Training Details

  • Base model: Qwen2.5-VL-7B-Instruct
  • Training method: GRPO (Group Relative Policy Optimization) with LoRA, then merged
  • Reward signals: Text similarity, crop cost penalty, LLM-as-a-judge
  • Framework: Custom TRL fork + DeepSpeed ZeRO-2

Citation

@article{shabtay2026awares,
  title={Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs},
  author={Shabtay, Nimrod and Kimhi, Moshe and Spector, Artem and Haray, Sivan and Rivlin, Ehud and Baskin, Chaim and Giryes, Raja and Schwartz, Eli},
  journal={arXiv preprint arXiv:2603.16932},
  year={2026}
}
Downloads last month
48
Safetensors
Model size
8B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kimhi/AWARES-Qwen2.5-VL-7B

Finetuned
(1073)
this model
Quantizations
2 models

Paper for Kimhi/AWARES-Qwen2.5-VL-7B