NeuronSpark-V4-1.16B-Pretrain

NeuronSpark V4 autoregressive pretraining checkpoint.

This repository contains a complete training checkpoint for continued pretraining, not only inference weights.

Checkpoint

  • Architecture: NeuronSpark V4 causal language model
  • Scale: 1.16B parameters
  • Checkpoint step: 10500
  • Tokens seen: 2,063,372,760 supervised tokens
  • Sequence length: 2048
  • Training mode: autoregressive pretraining
  • Optimizer: Muon + Adam + Lion
  • DeepSpeed: ZeRO-0
  • Precision: bf16 training path

Included Files

  • model.safetensors: Hugging Face model weights for loading/evaluation.
  • config.json, configuration_neuronspark.py, modeling_neuronspark.py: self-contained custom model code/config.
  • tokenizer.json, tokenizer_config.json, chat_template.jinja: tokenizer assets.
  • training_state.pth: saved training step and token counter.
  • deepspeed/: DeepSpeed checkpoint state for continued training.

Continue Training

Download or snapshot this repository, then resume with the original training script:

deepspeed --num_gpus=8 train_pretrain.py \
  --config_json configs/smoke_1p16b.json \
  --data_path <pretokenized_data_dir> \
  --tokenizer_path tokenizer_v3 \
  --out_dir <new_output_dir> \
  --deepspeed_config configs/ds_zero0_v4.json \
  --max_length 2048 \
  --batch_size 12 \
  --accumulation_steps 1 \
  --optimizer muon_adam_lion \
  --learning_rate 2e-4 \
  --muon_lr 0.005 \
  --lion_lr 1e-4 \
  --warmup_iters 500 \
  --grad_clip 0.5 \
  --resume <downloaded_checkpoint_dir>

Provenance

This is a V4 pretraining checkpoint from the current NeuronSpark V4 branch. It is not the historical V2.5/V3 checkpoint family.

Downloads last month
37
Safetensors
Model size
1B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support