Model Card for index-card-detector-v4

YOLO26n object detector for archival index cards. Extends the NLS v1 detector to handle multi-card scans and cards from more institutions, with multi-scale training for input-size robustness.

Model Details

Model Description

Fine-tuned from NationalLibraryOfScotland/archival-index-card-detector/model.pt (YOLO26n) on a mixed-collection dataset of 1,425 archival scans spanning four institutions. Compared to v1, this checkpoint:

  • handles multi-card scans (2–9 cards arranged on a sheet, e.g. Navy biographical record series)
  • generalises across institutional card styles (NLS, BPL printed, Rubenstein handwritten, Navy typed)
  • is trained at random imgsz ±50% per batch (multi_scale=True), so it tolerates input scans from ~640px up to ~1500px without re-tuning the inference imgsz

Single class: card.

  • Developed by: Daniel van Strien, Machine Learning Librarian, Hugging Face
  • Model type: Object detection (YOLO26n, single class, 2.5M params, ~5.5 MB)
  • Language(s): en (cards are English; model is language-agnostic visually but evaluated on English archives)
  • License: AGPL-3.0 (inherits from upstream Ultralytics / NLS baseline)
  • Finetuned from model: NationalLibraryOfScotland/archival-index-card-detector

Model Sources

Uses

Direct Use

Run the model on any archival scan to locate index cards. Returns one bounding box per detected card. Pair with a downstream OCR/VLM model for content extraction.

Downstream Use

  • Crop-then-OCR pipelines — detect cards, crop each, feed to OCR (NuExtract3, Qwen-VL, Trotta, etc.)
  • Triage — quickly identify which pages in a digitised collection contain card-like content vs other materials
  • Card counting — automated tallies for collection-level metrics

Out-of-Scope Use

  • OCR / content extraction — this is a detection model. It says where the card is, not what it says.
  • Content classification — single class. Does not distinguish blank from content, manuscript from typed, etc. For blank/content filtering see small-models-for-glam/index-card-blank-detector.
  • Non-English cards — training data is English-only.
  • Card-style traditions outside US/UK archives — not validated on continental European library catalogs, East Asian indices, etc.

Bias, Risks, and Limitations

  • No true non-card negatives in training beyond the 10 NLS background pages. The model may over-predict on newspaper clippings, photographs, or book pages when those appear in mixed archival scans.
  • Multi-card variety concentrated in 25 Navy images. Very different multi-card layouts (12+ card grids, severely overlapping or rotated cards) are out-of-distribution.
  • BPL + Rubenstein training labels are auto-generated (bbox = whole image for pre-cropped sources). Boxes on cropped-card inputs are loose by construction. If you need tight per-card pixel-accuracy, validate on a held-out set first.
  • Multi-scale training trades 1–3% in-distribution mAP@50:95 for scale robustness. v4's bboxes are slightly looser than v3's (single-scale checkpoint also published at index-card-detector-v3 for comparison). For downstream OCR this is acceptable; for tight cropping you may prefer v3.

Recommendations

  • For mixed-content archives (cards + photos + clippings), apply a downstream content classifier or train v5 with explicit negatives.
  • For tight pixel-accurate cropping, evaluate v3 vs v4 on your data and pick.
  • For non-English collections, fine-tune with additional samples from that tradition.

How to Get Started with the Model

from huggingface_hub import hf_hub_download
from ultralytics import YOLO
from PIL import Image

weights = hf_hub_download(
    repo_id="small-models-for-glam/index-card-detector-v4",
    filename="best.pt",
)
model = YOLO(weights)

results = model.predict("your_scan.jpg", conf=0.25, imgsz=1024)[0]
for box in results.boxes.xyxy.cpu().tolist():
    print(box)  # [x1, y1, x2, y2] in pixel coords

ONNX (faster CPU inference)

An ONNX export with dynamic axes is also published as best.onnx (~10.7 MB). Use it for 2–3× faster CPU inference via ONNX Runtime:

pip install ultralytics onnxruntime
weights = hf_hub_download(
    repo_id="small-models-for-glam/index-card-detector-v4",
    filename="best.onnx",
)
model = YOLO(weights)  # ultralytics auto-detects ONNX and uses onnxruntime
results = model.predict("your_scan.jpg", conf=0.25, imgsz=1024)[0]

The ONNX export is dynamic on the spatial axes (height, width), so it accepts variable input sizes without re-export. Use best.pt if you need GPU acceleration or further fine-tuning; use best.onnx for CPU-only deployment.

Training Details

Training Data

small-models-for-glam/index-card-detection-v3 — 1,425 images, four collections:

Collection Rows Label provenance
NLS Advocates 100 Original NLS bboxes (10 has_card=False negatives)
Navy Nurse Corps 25 SAM3 bootstrap + human review
BPL Catalog 800 Auto bbox = whole image (pre-cropped)
Rubenstein Manuscript 500 Auto bbox = whole image (pre-cropped)

Training Procedure

Training Hyperparameters

  • Base architecture: YOLO26n (ultralytics>=8.3)
  • Initial weights: NationalLibraryOfScotland/archival-index-card-detector/model.pt
  • Epochs cap: 200 (early-stopped via patience)
  • Patience: 30
  • Image size: 1024 (training)
  • multi_scale: True (imgsz randomly ∈ [512, 1536] per batch)
  • Batch: 16
  • Optimizer: AdamW (auto-tuned by ultralytics — lr0=0.002, momentum=0.9)
  • LR schedule: cosine (cos_lr=True), lrf=0.01
  • Train/val split: stratified 80/20 per collection (deterministic seed)
  • Augmentation: mosaic=0.5, fliplr=0.5, hsv_h=0.015, hsv_s=0.4, hsv_v=0.3, degrees=3, translate=0.05, scale=0.25

Speeds, Sizes, Times

  • Hardware: HF Jobs a100-large (1× A100-SXM4-80GB)
  • Training time: ~30 minutes (early-stopped)
  • Final weights: 5.5 MB

Evaluation

Testing Data, Factors & Metrics

Testing Data

285 held-out images (20% of training set, stratified per collection). No external test set used.

Metrics

mAP@50 and mAP@50:95 (standard COCO-style detection metrics), reported per collection.

Results

Collection n val mAP@50 mAP@50:95
NLS Advocates 20 0.995 0.857
Navy Nurse Corps 5 0.986 0.764
BPL Catalog 160 0.995 0.991
Rubenstein Manuscript 100 0.995 0.993

mAP@50 is essentially saturated across all collections — the model finds every card. mAP@50:95 is lower on NLS and Navy because multi-scale training trades bbox tightness for scale robustness (see Bias/Risks above).

For comparison, the single-scale index-card-detector-v3 checkpoint trained on the same data scores higher mAP@50:95 (NLS 0.980, Navy 0.929) but is less robust on out-of-distribution input sizes.

Summary

v4 is the recommended checkpoint for general use across varied input sizes. Use v3 if you need maximum bbox tightness on archives that match the training distribution.

Environmental Impact

Training carbon footprint is small — ~30 min on a single A100. Approximate emissions: ~0.05 kgCO₂eq (based on Lacoste et al. 2019 calculator, averaged grid mix). Inference on CPU is ~0 emissions.

Technical Specifications

Model Architecture and Objective

YOLO26n — single-stage anchor-free detector. 260 layers, 2.5M parameters, 5.8 GFLOPs. Three detection heads at strides 8, 16, 32 over a CSP-Darknet-style backbone.

Compute Infrastructure

  • Hardware: HF Jobs a100-large (1× A100-SXM4-80GB)
  • Image: vllm/vllm-openai:latest (used as host for CUDA-12-compatible PyTorch — see training script for context)
  • Framework: ultralytics 8.4.56, PyTorch 2.11.0+cu130

Citation

BibTeX:

@misc{vanstrien_index_card_detector_v4_2026,
  author       = {van Strien, Daniel},
  title        = {{index-card-detector-v4: a YOLO26n object detector for archival index cards}},
  year         = {2026},
  publisher    = {Hugging Face},
  url          = {https://huggingface.co/small-models-for-glam/index-card-detector-v4}
}

Model Card Authors

Daniel van Strien

Model Card Contact

Open an issue at the model repo or contact @davanstrien.

Downloads last month
53
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for small-models-for-glam/index-card-detector-v4

Dataset used to train small-models-for-glam/index-card-detector-v4