YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
AGILLM 4.3 โ Autoregressive + DiffusionBlock + MoE Language Model
Single-file implementation: agillm41.py
Parameters: 1.22B (1,221,580,802)
Architecture: d_model=1280, layers=28, heads=20, d_k=64, rank=160 (2.5ร expansion), tied weights
โ ๏ธ CHECKPOINT PROVENANCE โ READ FIRST
Checkpoint filenames (e.g. pretrain_step00050650.pt) reflect the step counter within the current training run, NOT total training steps.
This model warm-started from step 2,182,564 (~2.1M steps) of a prior run.
| What the filename says | What it actually means |
|---|---|
pretrain_step00050650.pt |
Current-run step 50,650 |
| True total steps | โ 2,182,564 + 50,650 = ~2,233,214 steps |
| Tokens seen (current run) | ~4.2B / 67.2B target (6.25%) |
Checkpoints live in:
checkpoints/warmstart_step2182564__current_step50650/
The folder name is the canonical reference for provenance.
Architecture
| Component | Value |
|---|---|
| Backbone | Autoregressive transformer (AR) |
| DiffusionBlocks | Active โ layers cycle AR/SAT/NAT objectives |
| Mixture-of-Experts | Active โ 14 slots per block |
| d_model | 1280 |
| Layers | 28 |
| Attention heads | 20 |
| Tied weights | Yes |
| Tokenizer | Llama-compatible (from checkpoint) |
Training Fleet (as of 2026-06-24)
- FedA (41441116): 2ร V100-SXM2-32GB,
ssh2.vast.ai:11116, $0.0593/hr- a0: role=coverage, B=56, L=1536
- a1: role=hard-blocks, B=48, L=1536
- Target: 67.2B tokens total
- Budget runway: ~Jul 24, 2026
Inference
# AR mode (standard autoregressive)
python3 agillm41.py infer \
--ckpt checkpoints/warmstart_step2182564__current_step50650/pretrain_step00050650.pt \
--prompt "Your prompt here" \
--mode ar --max_new 100 --plain-output --block_stream
# SAT mode (score-and-threshold diffusion)
python3 agillm41.py infer ... --mode sat
# NAT mode (non-autoregressive diffusion)
python3 agillm41.py infer ... --mode nat
Note: If both GPUs are busy with training, add
CUDA_VISIBLE_DEVICES=""to force CPU inference (slow but functional: ~1.2 tok/s).
Dependency:
agillm_checkpoint_provenance.pymust be in the same directory asagillm41.py.
Current Inference Quality (step ~50,650 / ~2.23M total)
See INFERENCE_QUALITY.md for AR/SAT/NAT benchmark outputs at each major checkpoint.
At this training stage (6.25% of token target), output is partially coherent โ the model knows structure, names, dates, and grammar patterns but has not yet converged on fluent generation. Expect significant quality improvement as training approaches 67B tokens.
Repositories
| Repo | Type | Notes |
|---|---|---|
Marxist-Leninist/agillm4.3-private |
GitHub private | Source of truth for code |
Marxist-Leninist/AGILLM4.3 |
GitHub public | Mirror |
Marxist-Leninist/AGILLM4.1 |
GitHub public | Mirror (same codebase) |
Marxist-Leninist/agillm4.1-private |
GitHub private | Mirror |
OpenTransformer/agillm4.3-private |
HuggingFace private | Code + checkpoints |
OpenTransformer/AGILLM-4.3 |
HuggingFace public | Code + checkpoints |
For Future Claude/AI Agents
MCP memory (Silicon Goddess) slot index for AGILLM4.3 state: slots 42, 95, 481โ525+.
Standing instruction: always run AR + SAT + NAT inference checks before reporting training healthy. See INFERENCE_QUALITY.md.
Latest Inference Smoke Test - 2026-06-26
Latest smoke-test artifacts were uploaded under training/agillm43_shared/inference/20260626T183400Z/.
- Monolithic latest-checkpoint AR:
/workspace/agillm4_v100a0_ckpts/pretrain_step00065633_from00050650_20260626T1811Z.pt, 32 tokens at 5.0 tok/s on CPU. - Distributed AR: existing 2026-06-06 split packages across GETH/MCP/Prime/communist-web, 32 tokens at 1.504 tok/s.
- Status aliases:
training/agillm43_shared/status/latest_inference.mdand.json.