Mamba3-2.7B (Alpha)

This model is a structurally transmutated version of Mamba2-2.7B, migrated to the Mamba3 architecture using the Subsuminator framework.

🔬 Alpha — base weights transmuted, instruction tuning not yet applied. CE ratio vs Mamba2-2.7B baseline: confirmed ≤1.05x on A10G (Actual Ratio: 1.0016x).

Model Highlights

Architecture: Mamba3 (SISO Trapezoidal + RoPE)
Scale: 2.7 Billion Parameters
Training Phase: Alpha (Pre-SFT Checkpoint)
Precision: BFloat16

Structural Changes Applied

Temporal Convolution Folding
SiLU Linearization via scalar alpha
B/C RMSNorm expectation initialization
Trap gate and RoPE zero-init
Data-dependent A structural extraction

This model requires fine-tuning before it can be reliably used for text generation.

Adapter Code

The nine-point weight mapping used to produce this checkpoint is fully open-sourced:

Rta-Forge/heists-galore — mamba2-to-mamba3/

Includes the weight adapter (mamba3_adapter.py), the empirical activation measurement rig (empirical_fit.py), and the CE verification harness (check_ce.py). No internal dependencies — runs against any standard mamba-ssm installation.

Methodology paper

FORGEry: A Multi-Model Adversarial Research Methodology for Independent AI Researchers — Zenodo preprint, CC BY 4.0.

DOI: 10.5281/zenodo.20374967

Downloads last month: 14

Model tree for RtaForge/Mamba3-2.7B

Base model

state-spaces/mamba2-2.7b

Finetuned

(5)

this model