Instructions to use small-models-for-glam/index-card-detector-v4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- ultralytics
How to use small-models-for-glam/index-card-detector-v4 with ultralytics:
# Couldn't find a valid YOLO version tag. # Replace XX with the correct version. from ultralytics import YOLOvXX model = YOLOvXX.from_pretrained("small-models-for-glam/index-card-detector-v4") source = 'http://images.cocodataset.org/val2017/000000039769.jpg' model.predict(source=source, save=True) - Notebooks
- Google Colab
- Kaggle
Model Card for index-card-detector-v4
YOLO26n object detector for archival index cards. Extends the NLS v1 detector to handle multi-card scans and cards from more institutions, with multi-scale training for input-size robustness.
Model Details
Model Description
Fine-tuned from NationalLibraryOfScotland/archival-index-card-detector/model.pt (YOLO26n) on a mixed-collection dataset of 1,425 archival scans spanning four institutions. Compared to v1, this checkpoint:
- handles multi-card scans (2–9 cards arranged on a sheet, e.g. Navy biographical record series)
- generalises across institutional card styles (NLS, BPL printed, Rubenstein handwritten, Navy typed)
- is trained at random imgsz ±50% per batch (
multi_scale=True), so it tolerates input scans from ~640px up to ~1500px without re-tuning the inference imgsz
Single class: card.
- Developed by: Daniel van Strien, Machine Learning Librarian, Hugging Face
- Model type: Object detection (YOLO26n, single class, 2.5M params, ~5.5 MB)
- Language(s): en (cards are English; model is language-agnostic visually but evaluated on English archives)
- License: AGPL-3.0 (inherits from upstream Ultralytics / NLS baseline)
- Finetuned from model:
NationalLibraryOfScotland/archival-index-card-detector
Model Sources
- Repository: https://huggingface.co/small-models-for-glam/index-card-detector-v4
- Training dataset: https://huggingface.co/datasets/small-models-for-glam/index-card-detection-v3
- Demo Space: https://huggingface.co/spaces/small-models-for-glam/index-card-detector
Uses
Direct Use
Run the model on any archival scan to locate index cards. Returns one bounding box per detected card. Pair with a downstream OCR/VLM model for content extraction.
Downstream Use
- Crop-then-OCR pipelines — detect cards, crop each, feed to OCR (NuExtract3, Qwen-VL, Trotta, etc.)
- Triage — quickly identify which pages in a digitised collection contain card-like content vs other materials
- Card counting — automated tallies for collection-level metrics
Out-of-Scope Use
- OCR / content extraction — this is a detection model. It says where the card is, not what it says.
- Content classification — single class. Does not distinguish blank from content, manuscript from typed, etc. For blank/content filtering see
small-models-for-glam/index-card-blank-detector. - Non-English cards — training data is English-only.
- Card-style traditions outside US/UK archives — not validated on continental European library catalogs, East Asian indices, etc.
Bias, Risks, and Limitations
- No true non-card negatives in training beyond the 10 NLS background pages. The model may over-predict on newspaper clippings, photographs, or book pages when those appear in mixed archival scans.
- Multi-card variety concentrated in 25 Navy images. Very different multi-card layouts (12+ card grids, severely overlapping or rotated cards) are out-of-distribution.
- BPL + Rubenstein training labels are auto-generated (
bbox = whole imagefor pre-cropped sources). Boxes on cropped-card inputs are loose by construction. If you need tight per-card pixel-accuracy, validate on a held-out set first. - Multi-scale training trades 1–3% in-distribution mAP@50:95 for scale robustness. v4's bboxes are slightly looser than v3's (single-scale checkpoint also published at
index-card-detector-v3for comparison). For downstream OCR this is acceptable; for tight cropping you may prefer v3.
Recommendations
- For mixed-content archives (cards + photos + clippings), apply a downstream content classifier or train v5 with explicit negatives.
- For tight pixel-accurate cropping, evaluate v3 vs v4 on your data and pick.
- For non-English collections, fine-tune with additional samples from that tradition.
How to Get Started with the Model
from huggingface_hub import hf_hub_download
from ultralytics import YOLO
from PIL import Image
weights = hf_hub_download(
repo_id="small-models-for-glam/index-card-detector-v4",
filename="best.pt",
)
model = YOLO(weights)
results = model.predict("your_scan.jpg", conf=0.25, imgsz=1024)[0]
for box in results.boxes.xyxy.cpu().tolist():
print(box) # [x1, y1, x2, y2] in pixel coords
ONNX (faster CPU inference)
An ONNX export with dynamic axes is also published as best.onnx (~10.7 MB). Use it for 2–3× faster CPU inference via ONNX Runtime:
pip install ultralytics onnxruntime
weights = hf_hub_download(
repo_id="small-models-for-glam/index-card-detector-v4",
filename="best.onnx",
)
model = YOLO(weights) # ultralytics auto-detects ONNX and uses onnxruntime
results = model.predict("your_scan.jpg", conf=0.25, imgsz=1024)[0]
The ONNX export is dynamic on the spatial axes (height, width), so it accepts variable input sizes without re-export. Use best.pt if you need GPU acceleration or further fine-tuning; use best.onnx for CPU-only deployment.
Training Details
Training Data
small-models-for-glam/index-card-detection-v3 — 1,425 images, four collections:
| Collection | Rows | Label provenance |
|---|---|---|
| NLS Advocates | 100 | Original NLS bboxes (10 has_card=False negatives) |
| Navy Nurse Corps | 25 | SAM3 bootstrap + human review |
| BPL Catalog | 800 | Auto bbox = whole image (pre-cropped) |
| Rubenstein Manuscript | 500 | Auto bbox = whole image (pre-cropped) |
Training Procedure
Training Hyperparameters
- Base architecture: YOLO26n (
ultralytics>=8.3) - Initial weights:
NationalLibraryOfScotland/archival-index-card-detector/model.pt - Epochs cap: 200 (early-stopped via patience)
- Patience: 30
- Image size: 1024 (training)
multi_scale: True (imgsz randomly ∈ [512, 1536] per batch)- Batch: 16
- Optimizer: AdamW (auto-tuned by ultralytics — lr0=0.002, momentum=0.9)
- LR schedule: cosine (
cos_lr=True), lrf=0.01 - Train/val split: stratified 80/20 per collection (deterministic seed)
- Augmentation: mosaic=0.5, fliplr=0.5, hsv_h=0.015, hsv_s=0.4, hsv_v=0.3, degrees=3, translate=0.05, scale=0.25
Speeds, Sizes, Times
- Hardware: HF Jobs a100-large (1× A100-SXM4-80GB)
- Training time: ~30 minutes (early-stopped)
- Final weights: 5.5 MB
Evaluation
Testing Data, Factors & Metrics
Testing Data
285 held-out images (20% of training set, stratified per collection). No external test set used.
Metrics
mAP@50 and mAP@50:95 (standard COCO-style detection metrics), reported per collection.
Results
| Collection | n val | mAP@50 | mAP@50:95 |
|---|---|---|---|
| NLS Advocates | 20 | 0.995 | 0.857 |
| Navy Nurse Corps | 5 | 0.986 | 0.764 |
| BPL Catalog | 160 | 0.995 | 0.991 |
| Rubenstein Manuscript | 100 | 0.995 | 0.993 |
mAP@50 is essentially saturated across all collections — the model finds every card. mAP@50:95 is lower on NLS and Navy because multi-scale training trades bbox tightness for scale robustness (see Bias/Risks above).
For comparison, the single-scale index-card-detector-v3 checkpoint trained on the same data scores higher mAP@50:95 (NLS 0.980, Navy 0.929) but is less robust on out-of-distribution input sizes.
Summary
v4 is the recommended checkpoint for general use across varied input sizes. Use v3 if you need maximum bbox tightness on archives that match the training distribution.
Environmental Impact
Training carbon footprint is small — ~30 min on a single A100. Approximate emissions: ~0.05 kgCO₂eq (based on Lacoste et al. 2019 calculator, averaged grid mix). Inference on CPU is ~0 emissions.
Technical Specifications
Model Architecture and Objective
YOLO26n — single-stage anchor-free detector. 260 layers, 2.5M parameters, 5.8 GFLOPs. Three detection heads at strides 8, 16, 32 over a CSP-Darknet-style backbone.
Compute Infrastructure
- Hardware: HF Jobs a100-large (1× A100-SXM4-80GB)
- Image:
vllm/vllm-openai:latest(used as host for CUDA-12-compatible PyTorch — see training script for context) - Framework:
ultralytics 8.4.56, PyTorch 2.11.0+cu130
Citation
BibTeX:
@misc{vanstrien_index_card_detector_v4_2026,
author = {van Strien, Daniel},
title = {{index-card-detector-v4: a YOLO26n object detector for archival index cards}},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/small-models-for-glam/index-card-detector-v4}
}
Model Card Authors
Model Card Contact
Open an issue at the model repo or contact @davanstrien.
- Downloads last month
- 53