Transformer Language Model (Character-level GPT) A character-level Transformer language model built from scratch in PyTorch, based on Andrej Karpathy's "Let's Build GPT" tutorial. Trained on William Shakespeare's Coriolanus. Model A causal (autoregressive) Transformer that predicts the next character given up to 256 previous characters. The model uses multi-head self-attention with causal masking, ReLU feed-forward networks, LayerNorm, and residual connections.

Architecture

Transformer blocks (layers) 6 Attention heads 6 Embedding dimension 384 (64 per head) Context length (block size) 256 Dropout 0.2 Feed-forward expansion 4× Vocabulary size character-level (from training text) Activation ReLU Normalization LayerNorm (post-norm in residual connections)

Training

Hyperparameter Optimizer Learning rate Batch size Max iterations Eval interval Train/val split

Final metrics:

  • Training loss: 0.8848
  • Validation loss: 1.5879

Training Data

Trained on the full text of Coriolanus by William Shakespeare, preprocessed as a character-level sequence. The vocabulary consists of all unique characters present in the play. Usage import torch from torch.nn import functional as F

This model is shared for educational and research purposes.

References

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for marmossburg/shakespeare