Seeing Through Words: Controlling Visual Retrieval Quality with Language Models
Why QCQC?
Text-to-image retrieval usually optimizes for relevance only. In practice you often care about quality too: more aesthetic photos, fewer blurry or low-IQA images, or a custom trade-off. We call this Quality-Controllable Retrieval (QCR), a new setting where retrieval can be explicitly conditioned on user-defined quality requirements.
We propose Quality-Conditioned Query Completion (QCQC), a query completion framework that leverages LLMs to enrich short queries with quality-aware descriptive details. Specify desired quality (e.g., aesthetic, relevance, image quality), and QCQC completes your query so retrieval returns results that match both meaning and quality.
- Quality control โ Describe desired quality as the condition; no separate filters or post-hoc ranking.
- Multi-dimensional quality โ Aesthetic, image quality (IQA), and relevance, composable in one framework (adapt to any quality definition).
- Reproducible โ MS-COCO workflow, clear data pipeline, and training/inference scripts.
Overview
We use MS-COCO and GPT-2 as the running example: download data, build a search index, generate auxiliary quality scores (aesthetic, IQA, relevance), tokenize the data, train the QCQC model, and then run retrieval. The steps below walk through the full pipeline.
Environment Installation
bash ./src/setup_envir.sh
conda activate QCQC
Dataset Preparation
Download MS-COCO dataset
python ./src/download_coco.py
unzip ./coco_data/train2017.zip -d ./coco_data/
unzip ./coco_data/annotations_trainval2017.zip -d ./coco_data/
Build search index
CUDA_VISIBLE_DEVICES=0 python ./src/search_preparation.py
Auxiliary Data Generation
Quality conditioning relies on precomputed scores. Follow the steps below for each type.
Image Aesthetic Scores
Follow the setup in improved-aesthetic-predictor.
Install extra dependencies:
conda run -n QCQC pip install webdataset pytorch-lightning
Generate aesthetic scores:
CUDA_VISIBLE_DEVICES=0 python ./improved-aesthetic-predictor/simple_inference_coco.py
IQA Scores
Follow the setup in DeQA-Score. Create a separate environment:
conda create -yn DeQA python=3.10
conda activate DeQA
cd DeQA-Score
pip install -e .
pip install pycocotools numpy==1.26.4 protobuf
Generate IQA scores:
CUDA_VISIBLE_DEVICES=0 python ./src/evaluate/scorer_coco.py
Relevance Scores
Relevance scores are computed with CLIP. From the QCQC environment:
conda activate QCQC
CUDA_VISIBLE_DEVICES=0 python ./src/generate_relevance_scores.py
Training & Testing
1. Data tokenization
CUDA_VISIBLE_DEVICES=0 python ./src/run_tokenize.py
2. Model training
Multi-GPU example (8 GPUs):
torchrun --nproc_per_node=8 --master_port=1221 ./src/train.py \
--lr 2e-3 --warmup 100 --epochs 20 --bs 256 \
--logstep 100 --evalstep 100 --savestep 100 \
--project_name GPT2_COCO --run_name prompt_gpt2coco
3. Model testing
bash src/inference.sh
4. Upload to Huggingface
cd ..
hf upload Johnny050407/QCQC QCQC
Pretrained Checkpoints and Processed Data
Pretrained checkpoints and preprocessed auxiliary data for MS-COCO are publicly available on Hugging Face:
https://huggingface.co/Johnny050407/QCQC/
Results
Qualitative examples of quality-conditioned retrieval:
Citation
If you use this code or idea in your work, please cite:
@inproceedings{JianglinQCQC2026,
title = {Seeing Through Words: Controlling Visual Retrieval Quality with Language Models},
author = {Jianglin Lu and Simon Jenni and Kushal Kafle and Jing Shi and Handong Zhao and Yun Fu},
booktitle = {The Fourteenth International Conference on Learning Representations (ICLR)},
year = {2026},
url = {https://openreview.net/forum?id=yOEmEXmbV8},
}
Acknowledgement
We use the following open-source projects and thank the authors:
- improved-aesthetic-predictor for aesthetic quality evaluation
- DeQA-Score for IQA score prediction

