roneneldan/TinyStories
Viewer β’ Updated β’ 2.14M β’ 91.5k β’ 1k
MicroMixer-1-1M is the largest model in the series with ~1M parameters. It can generate sentences like "Once upon a time there was a little girl named Lily" with reasonable fluency. Supports the longest sequence length of 256 tokens.
graph TD
A[Byte Input] --> B[Token Embedding]
B --> C[RoPE Position Encoding]
C --> D[ImprovedMixerLayer Γ3]
D --> E[LayerNorm]
E --> F[LM Head]
F --> G[Byte Output]
style A fill:#007BFF,color:#fff
style G fill:#00D620,color:#fff
style D fill:#AE00FF,color:#fff
| Parameter | Value |
|---|---|
| Total Parameters | 967,584 |
| Hidden Dimension | 224 |
| Channel MLP Dimension | 576 |
| Number of Layers | 3 |
| Max Sequence Length | 256 |
| Vocabulary Size | 256 (Byte-level) |
βββββββββββββββββββββββββββββββββββββββββββββββ
β ImprovedMixerLayer β
β βββββββββββββββββββββββββββββββββββββββ β
β β LayerNorm β HyperMixing β Residual β β β Token Mixing
β βββββββββββββββββββββββββββββββββββββββ€ β
β β LayerNorm β MlpBlock β Residual β β β Channel Mixing
β βββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββ
Linear β GELU β Linear| Metric | 500K | 1M | Change |
|---|---|---|---|
| Parameters | 557,328 | 967,584 | 1.7x |
| Hidden Dim | 176 | 224 | 1.3x |
| Channel MLP | 384 | 576 | 1.5x |
| Sequence Length | 128 | 256 | 2x |
Prompt: "Once upon a time"
Output: "Once upon a time there was a little girl named Lily. She loved t..."
Prompt: "Hello"
Output: "Hellog ann he grit litle girls. She love loved in the"
| Model | Parameters | Hidden | Seq Len | Quality |
|---|---|---|---|---|
| 100K | 136,908 | 84 | 64 | β |
| 300K | 331,680 | 128 | 128 | ββ |
| 500K | 557,328 | 176 | 128 | βββ |
| 1M | 967,584 | 224 | 256 | ββββ |
| Limitation | Description |
|---|---|
| Grammatical Errors | Fully grammatical sentences still difficult |
| Name Instability | Same name varies: "Lily", "Limmy", "Amby" |
| Short Prompt Issues | "Hello", "The weather is" produce near-random output |
| Overfitting | Overfits to specific TinyStories phrases |
Dataset: TinyStories
import torch
from huggingface_hub import hf_hub_download
from src.model import MicroMixerV2, MicroMixerV2Config
from src.tokenizer import ByteTokenizer
# Clone the repository first:
# git clone https://github.com/llaa33219/MicroMixer-1.git
# cd MicroMixer-1
config = MicroMixerV2Config(
max_seq_len=256,
hidden_dim=224,
channel_mlp_dim=576,
num_layers=3,
use_hyper=True,
)
model = MicroMixerV2(config)
weights_path = hf_hub_download("llaa33219/MicroMixer-1-1M-TinyStories", "model.pt")
model.load_state_dict(torch.load(weights_path, map_location="cpu"))
model.eval()
tokenizer = ByteTokenizer()
input_ids = torch.tensor([tokenizer.encode("Once upon a time")])
with torch.no_grad():
output = model.generate(input_ids, max_new_tokens=64, temperature=0.8, top_k=40)
print(tokenizer.decode(output[0].tolist()))
Part of the MicroMixer-1 research project