Diffusion Transformer (DiT) β€” CelebA-HQ Face Generation

A Diffusion Transformer (DiT) trained on CelebA-HQ for unconditional face image generation at 128x128 resolution. The model uses a Vision Transformer backbone in place of the conventional U-Net for the denoising network, operating in the latent space of a VQ-VAE.

Model Description

This is a two-stage pipeline:

  1. Stage 1 β€” VQ-VAE: Compresses 128x128 RGB images into a 4-channel discrete latent space with codebook size 8192.
  2. Stage 2 β€” DiT: A transformer-based denoising model that operates on flattened image patches in the VQ-VAE latent space.

DiT Architecture

The DiT (Peebles & Xie, 2023) replaces the U-Net backbone with a standard Vision Transformer (ViT) encoder. Each image latent is divided into non-overlapping patches, linearly embedded, and processed by a stack of transformer blocks with time-step conditioning via adaptive layer norm.

Parameter Value
Patch size 2
Transformer layers 12
Hidden dimension 768
Attention heads 12
Head dimension 64
Time embedding dim 768
Input resolution 128x128 (latent: ~16x16x4)

VQ-VAE Architecture

Parameter Value
Latent channels (z) 4
Codebook size 8192
Down channels [128, 256, 384]
Downsampling stages 2

Diffusion Process

Parameter Value
Timesteps (T) 1000
Beta schedule Linear, start=0.0001, end=0.02

Training Details

Stage Epochs LR Batch size
VQ-VAE 10 1e-5 4
DiT 500 1e-5 32
  • Dataset: CelebA-HQ, center-cropped and resized to 128x128, normalized to [-1, 1]
  • Data loaded from parquet files via a custom ParquetImageDataset

Generated Samples

The repository includes generated face samples in celebhq/samples/ (x0_*.jpg), produced by running the trained DiT in reverse diffusion from Gaussian noise.

Repository Contents

Path Description
celeba.py Parquet-based CelebA-HQ dataloader
celeba/config.yaml Full training configuration
celebhq/dit_ckpt.pth Trained DiT checkpoint
celebhq/samples/ Generated sample images

References

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for YashNagraj75/Diffusion-Transformer