MNIST CNN Digit Classifier

This is a Convolutional Neural Network (CNN) model trained on the MNIST dataset for handwritten digit classification.

Model Description

This model classifies handwritten digits (0-9) from 28x28 grayscale images using a custom CNN architecture with batch normalization.

Architecture Details:

Input: 28x28 grayscale images (1 channel)
Output: 10 classes (digits 0-9)
Layers: 4 Convolutional layers with BatchNorm and ReLU activation
Pooling: MaxPool2d after first conv layer
Final Layer: Linear layer (3136 → 10)
Parameters: ~50K trainable parameters

Usage

Security Note: Requires trust_remote_code=True because it uses custom model/processor classes.

Using transformers pipeline

from transformers import pipeline

clf = pipeline(
    "image-classification",
    model="kenil-patel-183/mnist-cnn-digit-classifier",
    trust_remote_code=True,   # required due to custom classes
  )

preds = clf("path/to/digit.png", top_k=1)
print(preds)  # [{'label': '7', 'score': 0.998...}]

Using manual loading

from transformers import AutoConfig, AutoModel, AutoImageProcessor
from PIL import Image

model_id = "kenil-patel-183/mnist-cnn-digit-classifier"
config = AutoConfig.from_pretrained(model_id, trust_remote_code=True)
model = AutoModel.from_pretrained(model_id, trust_remote_code=True)
processor = AutoImageProcessor.from_pretrained(model_id, trust_remote_code=True)

image = Image.open("digit.png")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
logits = outputs.logits
pred = logits.argmax(-1).item()
print(pred)

Model Architecture

MNISTCNN(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (lin): Linear(in_features=3136, out_features=10, bias=True)
  (network): Sequential(
    (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1))
    (1): BatchNorm2d(8, eps=1e-05, momentum=0.1)
    (2): ReLU()
    (3): MaxPool2d(kernel_size=(2, 2), stride=2)
    (4): Conv2d(8, 16, kernel_size=(3, 3), stride=(1, 1))
    (5): BatchNorm2d(16, eps=1e-05, momentum=0.1)
    (6): ReLU()
    (7): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1))
    (8): BatchNorm2d(32, eps=1e-05, momentum=0.1)
    (9): ReLU()
    (10): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))
    (11): BatchNorm2d(64, eps=1e-05, momentum=0.1)
    (12): ReLU()
  )
)

Training Data

Dataset: MNIST Handwritten Digits
Training samples: 60,000
Test samples: 10,000
Image size: 28x28 grayscale
Classes: 10 (digits 0-9)

Image Preprocessing Requirements

For best results, input images should be preprocessed as follows:

Convert to grayscale if not already
Resize to 28x28 pixels
Convert to tensor (values between 0 and 1)
Normalize with mean=0.1307, std=0.3081

transform = transforms.Compose([
    transforms.Grayscale(),
    transforms.Resize((28, 28)),
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

Performance

Achieved 99.25% accuracy on MNIST test set.

Limitations

Input format: Only works with 28x28 grayscale images
Domain: Optimized for handwritten digits, may not work well on printed text
Background: Works best with dark digits on light background
Noise: Performance may degrade with noisy or heavily distorted images

Downloads last month: 30

Safetensors

Model size

56.2k params

Tensor type

F32

kenil-patel-183
/

mnist-cnn-digit-classifier