Making Reconstruction FID Predictive of Diffusion Generation FID

This repository contains pretrained SiT (Scalable Interpolant Transformer) models and code for the paper Making Reconstruction FID Predictive of Diffusion Generation FID.

The paper introduces interpolated FID (iFID), a variant of reconstruction FID (rFID) that exhibits a strong correlation (~0.85) with the generation FID (gFID) of latent diffusion models, unlike standard rFID.

Usage

For detailed installation and setup, please refer to the GitHub repository.

iFID Evaluation

To evaluate iFID for a VAE:

accelerate launch --num_processes=4 --gpu_ids="0,1,2,3" evalvae.py \
    --seed=0 \
    --sample-dir="./samples" \
    --exp-name="ifid-sdvae" \
    --dataset="./ImageNet/val" \
    --dataset-ref="./ImageNet/train" \
    --vae-config="./configs/SDVAE.yaml"

Diffusion Generation

To sample from a trained SiT model (e.g., SiT-B for SD-VAE):

torchrun --nnodes=1 --nproc_per_node=4 --master_port 0 generate.py \
    --num-fid-samples 50000 \
    --mode sde \
    --num-steps 250 \
    --cfg-scale 1.0 \
    --guidance-high 1.0 \
    --guidance-low 0.0 \
    --exp-path "./exps/sit-b-sdvae-400k" \
    --fid-reference-file "VIRTUAL_imagenet256_labeled.npz" \
    --train-steps 400000

Citation

@article{xu2025making,
  title={Making Reconstruction FID Predictive of Diffusion Generation FID},
  author={Xu, Tongda and He, Mingwei and Abu-Hussein, Shady and Hernandez-Lobato, Jose Miguel and Zhang, Haotian and Zhao, Kai and Zhou, Chao and Zhang, Ya-Qin and Wang, Yan},
  journal={arXiv preprint arXiv:2603.05630},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for xutongda/Making-rFID-Predictive-of-Diffusion-gFID