Mamba3-2.7B (Alpha)

DOI

This model is a structurally transmutated version of Mamba2-2.7B, migrated to the Mamba3 architecture using the Subsuminator framework.

๐Ÿ”ฌ Alpha โ€” base weights transmuted, instruction tuning not yet applied. CE ratio vs Mamba2-2.7B baseline: confirmed โ‰ค1.05x on A10G (Actual Ratio: 1.0016x).

Model Highlights

  • Architecture: Mamba3 (SISO Trapezoidal + RoPE)
  • Scale: 2.7 Billion Parameters
  • Training Phase: Alpha (Pre-SFT Checkpoint)
  • Precision: BFloat16

Structural Changes Applied

  • Temporal Convolution Folding
  • SiLU Linearization via scalar alpha
  • B/C RMSNorm expectation initialization
  • Trap gate and RoPE zero-init
  • Data-dependent A structural extraction

This model requires fine-tuning before it can be reliably used for text generation.

Adapter Code

The nine-point weight mapping used to produce this checkpoint is fully open-sourced:

Rta-Forge/heists-galore โ€” mamba2-to-mamba3/

Includes the weight adapter (mamba3_adapter.py), the empirical activation measurement rig (empirical_fit.py), and the CE verification harness (check_ce.py). No internal dependencies โ€” runs against any standard mamba-ssm installation.

Methodology paper

FORGEry: A Multi-Model Adversarial Research Methodology for Independent AI Researchers โ€” Zenodo preprint, CC BY 4.0.

DOI: 10.5281/zenodo.20374967

Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for RtaForge/Mamba3-2.7B

Finetuned
(5)
this model