steganograph-ia-detector

Fine-tuned Vision Transformer (ViT) for AI-generated image detection.

This model is a binary classifier trained to distinguish real photographs from images generated by modern AI models (Stable Diffusion 2.1 / XL / 3, DALL-E 3, Midjourney v6).

Part of the SteganographIA project (MIAGE TPI).

Performance on the test set (15,000 balanced images)

Metric Value
Accuracy 0.923
Real precision / recall 0.88 / 0.99
AI precision / recall 0.98 / 0.86

The model is conservative: it almost never accuses a real image of being AI-generated (1.4% false positive rate), but misses ~14% of AI images by classifying them as real.

Training details

  • Base model: google/vit-base-patch16-224
  • Dataset: Rajarshi-Roy-research/Defactify_Image_Dataset (Defactify Challenge @ AAAI), rebalanced 50/50 by undersampling the AI class stratified across the 5 generators.
  • Train / Val / Test split: 14,000 / 3,000 / 15,000 images.
  • Optimizer: AdamW
  • Learning rate: 2e-5
  • Batch size: 16
  • Epochs: 3
  • Mixed precision (fp16) on a single Tesla T4 GPU.
  • Training time: ~16 minutes.

How to use

from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import torch

processor = AutoImageProcessor.from_pretrained("delpot/steganograph-ia-detector")
model = AutoModelForImageClassification.from_pretrained("delpot/steganograph-ia-detector")

image = Image.open("path/to/image.jpg").convert("RGB")
inputs = processor(image, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

predicted = logits.argmax(-1).item()
print(model.config.id2label[predicted])

Limitations

  • Trained on Defactify only. Generalization to other generators (notably Midjourney V1/V2, Flux, or future models not yet released) is not guaranteed.
  • Lower recall on the AI class: ~14% of AI-generated images are classified as real. For stricter use cases (e.g. moderation of AI content), the decision threshold could be adjusted.
  • Input images are resized to 224x224, so very high-resolution artifacts may be lost.
Downloads last month
35
Safetensors
Model size
85.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train delpot/steganograph-ia-detector