Making Reconstruction FID Predictive of Diffusion Generation FID
Paper • 2603.05630 • Published • 1
This repository contains pretrained SiT (Scalable Interpolant Transformer) models and code for the paper Making Reconstruction FID Predictive of Diffusion Generation FID.
The paper introduces interpolated FID (iFID), a variant of reconstruction FID (rFID) that exhibits a strong correlation (~0.85) with the generation FID (gFID) of latent diffusion models, unlike standard rFID.
For detailed installation and setup, please refer to the GitHub repository.
To evaluate iFID for a VAE:
accelerate launch --num_processes=4 --gpu_ids="0,1,2,3" evalvae.py \
--seed=0 \
--sample-dir="./samples" \
--exp-name="ifid-sdvae" \
--dataset="./ImageNet/val" \
--dataset-ref="./ImageNet/train" \
--vae-config="./configs/SDVAE.yaml"
To sample from a trained SiT model (e.g., SiT-B for SD-VAE):
torchrun --nnodes=1 --nproc_per_node=4 --master_port 0 generate.py \
--num-fid-samples 50000 \
--mode sde \
--num-steps 250 \
--cfg-scale 1.0 \
--guidance-high 1.0 \
--guidance-low 0.0 \
--exp-path "./exps/sit-b-sdvae-400k" \
--fid-reference-file "VIRTUAL_imagenet256_labeled.npz" \
--train-steps 400000
@article{xu2025making,
title={Making Reconstruction FID Predictive of Diffusion Generation FID},
author={Xu, Tongda and He, Mingwei and Abu-Hussein, Shady and Hernandez-Lobato, Jose Miguel and Zhang, Haotian and Zhao, Kai and Zhou, Chao and Zhang, Ya-Qin and Wang, Yan},
journal={arXiv preprint arXiv:2603.05630},
year={2025}
}