Nano-World-Model
Collection
🌍 A minimalist repository for training video world models based on diffusion-forcing. • 18 items • Updated • 5
Phase-2 baseline for the pusht environment from the DINO-WM suite, trained with the best ablation config (pred_target=v, additive action injection, cosine + ZTSNR) on NanoWM-B/2 for 100,000 steps.
src/scripts/phase2/dino_wm_pusht.sh| Key | Value |
|---|---|
| Architecture | NanoWM-B/2 (~158.6M params) |
| Dataset | DINO-WM pusht (osf.io/bmw48) |
| Frames × resolution | 4 × 224² (DINO latent space) |
| Context frames | 1 |
| Action injection | additive |
| Steps | 100,000 |
| Batch | 8/GPU × 8 × H20 |
| Optimizer | AdamW, lr 1e-4, wd 0.01 |
| Precision | bf16-mixed, torch.compile on |
| Seed | 3407 |
| Key | Value |
|---|---|
| pred_name | v |
| noise_schedule | squaredcos_cap_v2 (cosine) |
| zero_terminal_snr | true |
| timestep_sampling | logit_normal |
| snr_gamma | 5.0 |
| diffusion_steps | 1000 train · 250 DDIM sample |
git clone git@github.com:knightnemo/nano-world-model.git
cd nano-world-model
huggingface-cli download knightnemo/nanowm-b2-dino-wm-pusht-100k --local-dir ./ckpt
import sys
from omegaconf import OmegaConf
from safetensors.torch import load_file
sys.path.insert(0, "src")
from models import get_models
cfg = OmegaConf.load("ckpt/config.yaml")
cfg.experiment.infra.compile = False
model = get_models(cfg).eval()
state_dict = load_file("ckpt/model.safetensors")
model.load_state_dict(state_dict, strict=True)