NanoWM-L/2 · RT-1 · Ablation: model scale = S/2

Scale ablation arm on RT-1 fractal. All other axes (pred_target=v, additive action injection, cosine + ZTSNR schedule, 50k steps) match the B/2 reference, so this row directly isolates the effect of backbone capacity.

Run identity

wandb: https://wandb.ai/better_guidance/nano-world-model-ablation/runs/iv3fm8gv
launcher: src/scripts/ablation/scale_l2.sh
collection: https://huggingface.co/collections/knightnemo/nano-world-model

Training setup

Key	Value
Architecture	NanoWM-L/2 (24 layers, d=1024, patch=2, ~558.7M params)
Dataset	RT-1 fractal (`lerobot/fractal20220817_data`)
Frames × resolution	4 × 256² → 4 × 32² latents (SD-VAE)
Context frames	1 (sequential / self-forcing scheduling)
Action injection	additive (7-dim continuous)
Steps	50,000
Batch	8/GPU × 8 × H20 = 64 effective
Optimizer	AdamW, lr 1e-4, wd 0.01, warmup 1000, grad clip 0.1 after 20k
Precision	bf16-mixed (params fp32), VAE fp32, `torch.compile` on
Seed	3407

Diffusion setup

Key	Value
pred_name	v
noise_schedule	`squaredcos_cap_v2` (cosine)
zero_terminal_snr	true
timestep_sampling	logit_normal (SD3-style, μ=0, σ=1)
snr_gamma	5.0 (Min-SNR loss weighting)
diffusion_steps	1000 train · 250 DDIM sample
history_stabilization_level (inference)	0.02

Loading

git clone git@github.com:knightnemo/nano-world-model.git
cd nano-world-model
huggingface-cli download knightnemo/nanowm-l2-rt1-abl-scale-l2-50k --local-dir ./ckpt

import sys
from omegaconf import OmegaConf
from safetensors.torch import load_file
sys.path.insert(0, "src")
from models import get_models

cfg = OmegaConf.load("ckpt/config.yaml")
cfg.experiment.infra.compile = False
model = get_models(cfg).eval()

state_dict = load_file("ckpt/model.safetensors")
model.load_state_dict(state_dict, strict=True)

Downloads last month: 8

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including knightnemo/nanowm-l2-rt1-abl-scale-l2-50k

Nano-World-Model

Collection

🌍 A minimalist repository for training video world models based on diffusion-forcing. • 18 items • Updated 5 days ago • 5