YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

AGILLM 4.3 โ€” Autoregressive + DiffusionBlock + MoE Language Model

Single-file implementation: agillm41.py Parameters: 1.22B (1,221,580,802) Architecture: d_model=1280, layers=28, heads=20, d_k=64, rank=160 (2.5ร— expansion), tied weights


โš ๏ธ CHECKPOINT PROVENANCE โ€” READ FIRST

Checkpoint filenames (e.g. pretrain_step00050650.pt) reflect the step counter within the current training run, NOT total training steps.

This model warm-started from step 2,182,564 (~2.1M steps) of a prior run.

What the filename says What it actually means
pretrain_step00050650.pt Current-run step 50,650
True total steps โ‰ˆ 2,182,564 + 50,650 = ~2,233,214 steps
Tokens seen (current run) ~4.2B / 67.2B target (6.25%)

Checkpoints live in:

checkpoints/warmstart_step2182564__current_step50650/

The folder name is the canonical reference for provenance.


Architecture

Component Value
Backbone Autoregressive transformer (AR)
DiffusionBlocks Active โ€” layers cycle AR/SAT/NAT objectives
Mixture-of-Experts Active โ€” 14 slots per block
d_model 1280
Layers 28
Attention heads 20
Tied weights Yes
Tokenizer Llama-compatible (from checkpoint)

Training Fleet (as of 2026-06-24)

  • FedA (41441116): 2ร— V100-SXM2-32GB, ssh2.vast.ai:11116, $0.0593/hr
    • a0: role=coverage, B=56, L=1536
    • a1: role=hard-blocks, B=48, L=1536
  • Target: 67.2B tokens total
  • Budget runway: ~Jul 24, 2026

Inference

# AR mode (standard autoregressive)
python3 agillm41.py infer \
  --ckpt checkpoints/warmstart_step2182564__current_step50650/pretrain_step00050650.pt \
  --prompt "Your prompt here" \
  --mode ar --max_new 100 --plain-output --block_stream

# SAT mode (score-and-threshold diffusion)
python3 agillm41.py infer ... --mode sat

# NAT mode (non-autoregressive diffusion)
python3 agillm41.py infer ... --mode nat

Note: If both GPUs are busy with training, add CUDA_VISIBLE_DEVICES="" to force CPU inference (slow but functional: ~1.2 tok/s).

Dependency: agillm_checkpoint_provenance.py must be in the same directory as agillm41.py.


Current Inference Quality (step ~50,650 / ~2.23M total)

See INFERENCE_QUALITY.md for AR/SAT/NAT benchmark outputs at each major checkpoint.

At this training stage (6.25% of token target), output is partially coherent โ€” the model knows structure, names, dates, and grammar patterns but has not yet converged on fluent generation. Expect significant quality improvement as training approaches 67B tokens.


Repositories

Repo Type Notes
Marxist-Leninist/agillm4.3-private GitHub private Source of truth for code
Marxist-Leninist/AGILLM4.3 GitHub public Mirror
Marxist-Leninist/AGILLM4.1 GitHub public Mirror (same codebase)
Marxist-Leninist/agillm4.1-private GitHub private Mirror
OpenTransformer/agillm4.3-private HuggingFace private Code + checkpoints
OpenTransformer/AGILLM-4.3 HuggingFace public Code + checkpoints

For Future Claude/AI Agents

MCP memory (Silicon Goddess) slot index for AGILLM4.3 state: slots 42, 95, 481โ€“525+. Standing instruction: always run AR + SAT + NAT inference checks before reporting training healthy. See INFERENCE_QUALITY.md.

Latest Inference Smoke Test - 2026-06-26

Latest smoke-test artifacts were uploaded under training/agillm43_shared/inference/20260626T183400Z/.

  • Monolithic latest-checkpoint AR: /workspace/agillm4_v100a0_ckpts/pretrain_step00065633_from00050650_20260626T1811Z.pt, 32 tokens at 5.0 tok/s on CPU.
  • Distributed AR: existing 2026-06-06 split packages across GETH/MCP/Prime/communist-web, 32 tokens at 1.504 tok/s.
  • Status aliases: training/agillm43_shared/status/latest_inference.md and .json.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support