Nano-World-Model
Collection
🌍 A minimalist repository for training video world models based on diffusion-forcing. • 18 items • Updated • 5
Scale ablation arm on RT-1 fractal. All other axes (pred_target=v, additive action injection, cosine + ZTSNR schedule, 50k steps) match the B/2 reference, so this row directly isolates the effect of backbone capacity.
src/scripts/ablation/scale_l2.sh| Key | Value |
|---|---|
| Architecture | NanoWM-L/2 (24 layers, d=1024, patch=2, ~558.7M params) |
| Dataset | RT-1 fractal (lerobot/fractal20220817_data) |
| Frames × resolution | 4 × 256² → 4 × 32² latents (SD-VAE) |
| Context frames | 1 (sequential / self-forcing scheduling) |
| Action injection | additive (7-dim continuous) |
| Steps | 50,000 |
| Batch | 8/GPU × 8 × H20 = 64 effective |
| Optimizer | AdamW, lr 1e-4, wd 0.01, warmup 1000, grad clip 0.1 after 20k |
| Precision | bf16-mixed (params fp32), VAE fp32, torch.compile on |
| Seed | 3407 |
| Key | Value |
|---|---|
| pred_name | v |
| noise_schedule | squaredcos_cap_v2 (cosine) |
| zero_terminal_snr | true |
| timestep_sampling | logit_normal (SD3-style, μ=0, σ=1) |
| snr_gamma | 5.0 (Min-SNR loss weighting) |
| diffusion_steps | 1000 train · 250 DDIM sample |
| history_stabilization_level (inference) | 0.02 |
git clone git@github.com:knightnemo/nano-world-model.git
cd nano-world-model
huggingface-cli download knightnemo/nanowm-l2-rt1-abl-scale-l2-50k --local-dir ./ckpt
import sys
from omegaconf import OmegaConf
from safetensors.torch import load_file
sys.path.insert(0, "src")
from models import get_models
cfg = OmegaConf.load("ckpt/config.yaml")
cfg.experiment.infra.compile = False
model = get_models(cfg).eval()
state_dict = load_file("ckpt/model.safetensors")
model.load_state_dict(state_dict, strict=True)