Transformer Language Model (Character-level GPT) A character-level Transformer language model built from scratch in PyTorch, based on Andrej Karpathy's "Let's Build GPT" tutorial. Trained on William Shakespeare's Coriolanus. Model A causal (autoregressive) Transformer that predicts the next character given up to 256 previous characters. The model uses multi-head self-attention with causal masking, ReLU feed-forward networks, LayerNorm, and residual connections.
Architecture
Transformer blocks (layers) 6 Attention heads 6 Embedding dimension 384 (64 per head) Context length (block size) 256 Dropout 0.2 Feed-forward expansion 4× Vocabulary size character-level (from training text) Activation ReLU Normalization LayerNorm (post-norm in residual connections)
Training
Hyperparameter Optimizer Learning rate Batch size Max iterations Eval interval Train/val split
Final metrics:
- Training loss: 0.8848
- Validation loss: 1.5879
Training Data
Trained on the full text of Coriolanus by William Shakespeare, preprocessed as a character-level sequence. The vocabulary consists of all unique characters present in the play. Usage import torch from torch.nn import functional as F
This model is shared for educational and research purposes.
References
- Attention Is All You Need (https://arxiv.org/abs/1706.03762) — Vaswani et al., 2017
- Let's Build GPT (https://www.youtube.com/watch?v=kCc8FmEb1nY) — Andrej Karpathy
- Shakespeare, W. Coriolanus