Transformer Language Model (Character-level GPT) A character-level Transformer language model built from scratch in PyTorch, based on Andrej Karpathy's "Let's Build GPT" tutorial. Trained on William Shakespeare's Coriolanus. Model A causal (autoregressive) Transformer that predicts the next character given up to 256 previous characters. The model uses multi-head self-attention with causal masking, ReLU feed-forward networks, LayerNorm, and residual connections.

Architecture

Transformer blocks (layers) 6 Attention heads 6 Embedding dimension 384 (64 per head) Context length (block size) 256 Dropout 0.2 Feed-forward expansion 4× Vocabulary size character-level (from training text) Activation ReLU Normalization LayerNorm (post-norm in residual connections)

Training

Hyperparameter Optimizer Learning rate Batch size Max iterations Eval interval Train/val split

Final metrics:

Training loss: 0.8848
Validation loss: 1.5879

Training Data

Trained on the full text of Coriolanus by William Shakespeare, preprocessed as a character-level sequence. The vocabulary consists of all unique characters present in the play. Usage import torch from torch.nn import functional as F

This model is shared for educational and research purposes.

References

Attention Is All You Need (https://arxiv.org/abs/1706.03762) — Vaswani et al., 2017
Let's Build GPT (https://www.youtube.com/watch?v=kCc8FmEb1nY) — Andrej Karpathy
Shakespeare, W. Coriolanus

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for marmossburg/shakespeare

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 125