StableI2I: Spotting Unintended Changes in Image-to-Image Transition
Abstract
StableI2I is a unified evaluation framework that assesses content fidelity and consistency in image-to-image tasks without requiring reference images, providing accurate and interpretable measurements correlated with human judgments.
In most real-world image-to-image (I2I) scenarios, existing evaluations primarily focus on instruction following and the perceptual quality or aesthetics of the generated images. However, they largely fail to assess whether the output image preserves the semantic correspondence and spatial structure of the input image. To address this limitation, we propose StableI2I, a unified and dynamic evaluation framework that explicitly measures content fidelity and pre--post consistency across a wide range of I2I tasks without requiring reference images, including image editing and image restoration. In addition, we construct StableI2I-Bench, a benchmark designed to systematically evaluate the accuracy of MLLMs on such fidelity and consistency assessment tasks. Extensive experimental results demonstrate that StableI2I provides accurate, fine-grained, and interpretable evaluations of content fidelity and consistency, with strong correlations to human subjective judgments. Our framework serves as a practical and reliable evaluation tool for diagnosing content consistency and benchmarking model performance in real-world I2I systems.
Community
[ICML 2026] The first model for evaluating fidelity in image-to-image tasks. It assesses whether the generated image suffers from content errors, texture repainting, or other unintended changes, helping ensure consistency in regions that should be preserved.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- GEditBench v2: A Human-Aligned Benchmark for General Image Editing (2026)
- Can Nano Banana 2 Replace Traditional Image Restoration Models? An Evaluation of Its Performance on Image Restoration Tasks (2026)
- RealRestorer: Towards Generalizable Real-World Image Restoration with Large-Scale Image Editing Models (2026)
- DiffCap-Bench: A Comprehensive, Challenging, Robust Benchmark for Image Difference Captioning (2026)
- RefReward-SR: LR-Conditioned Reward Modeling for Preference-Aligned Super-Resolution (2026)
- TexEditor: Structure-Preserving Text-Driven Texture Editing (2026)
- OARS: Process-Aware Online Alignment for Generative Real-World Image Super-Resolution (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.04453 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 2
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper