nanoOasis β weights
Weights for nanoOasis, a from-scratch reference implementation of a diffusion world model β the "nanoGPT" of the Oasis / GameNGen / DIAMOND paradigm. The model generates the game Snake one frame at a time in response to your arrow keys: there is no game engine.
βΆ Play the live demo Β· Code + docs
Files
| file | what it is |
|---|---|
dit.pt |
the spatiotemporal DiT (13.5M params) β model + EMA weights (optimizer state stripped) |
vae.pt |
the ViT-VAE (7.6M params) β encodes a 256Γ192 frame to a latent and back |
onnx/dit.onnx |
the DiT denoiser exported to ONNX, FP16 (for ONNX Runtime Web / WebGPU) |
onnx/vae_dec.onnx |
the VAE decoder exported to ONNX, FP16 |
How it works
A ViT-VAE compresses each 256Γ192 frame to 48 latent tokens (the game is an 8Γ6 grid, so one cell = one DiT token). A 13.5M-parameter spatiotemporal DiT predicts the next latent from the past 8 latents + your action, trained with EDM preconditioning + Diffusion Forcing + context-noise augmentation. Four Euler sampling steps make it real-time, so there's no distillation. At play time the model is the only thing generating frames; a ~30-line deterministic "referee" adjudicates wall/self collisions (the discrete events diffusion models handle unreliably).
Trained end-to-end for under $50 on ~500k frames of bot-played Snake.
Usage
git clone https://github.com/MaruthiV/nanoOasis && cd nanoOasis
pip install -e .
# play it locally (pygame window, arrow keys)
python infer.py --ckpt dit.pt --vae vae.pt --config small
The ONNX files are what the in-browser demo runs; see export.py and demo/ in the repo.
Intended use & limitations
A reference implementation for learning and forking the diffusion-world-model recipe β not a product. It's a 13.5M-parameter model trained for ~$50, and it plays like one: crisp Snake for the first several apples, then long-body coherence frays (diffusion models fumble long thin structures, and error accumulates over a rollout). The demo re-seeds a clean context on death so each life starts fresh.
License
MIT. Built in the lineage of DIAMOND, GameNGen, and Oasis, in the spirit of nanoGPT.