| --- |
| license: apache-2.0 |
| tags: |
| - pytorch |
| - transformer |
| - mamba |
| - moe |
| - hybrid |
| - matryoshka |
| - gpt-oss |
| - adaptive-compute |
| pipeline_tag: text-generation |
| --- |
| |
| # π GPT-OSS Adamba: Hybrid MoE + Mamba |
|
|
| > **21.9B** parameters | **32 experts** | **Mamba-enhanced** reasoning backbone |
|
|
| π **[GitHub](https://github.com/unixsysdev/adamba)** | π€ **[Original Adamba](https://huggingface.co/datasysdev/adamba)** |
|
|
| ## Available Checkpoints |
|
|
| | Variant | Parameters | Dim | Features | Status | Download | |
| |---------|------------|-----|----------|--------|----------| |
| | gptoss_phase1 | 21.9B | 2880 | mamba_integration, moe_32experts | β
| [Download](./checkpoints/gptoss_phase1.pt) | |
| | gptoss_phase2 | 21.9B | 2880 | matryoshka, early_exit, moe_32experts | β³ | β | |
| | gptoss_phase3 | 30B+ | 4096 | matryoshka, early_exit, moe_32experts, expansion | β³ | β | |
| | gptoss_sft | 21.9B | 2880 | matryoshka, moe_32experts, sft | β³ | β | |
| |
| ## Architecture |
| |
| Built on [OpenAI GPT-OSS 20B](https://huggingface.co/openai/gpt-oss-20b) with Mamba integration: |
| |
| | Component | Spec | |
| |-----------|------| |
| | **Base Model** | GPT-OSS 20B MoE | |
| | **Hidden Dim** | 2880 | |
| | **Attention** | 24 layers (sliding + full alternating) | |
| | **Mamba** | 12 layers (interleaved 2:1) | |
| | **MoE** | 32 experts, top-4 routing | |
| | **Vocab** | 201,088 tokens | |
| | **Total Blocks** | 36 (24 Attn + 12 Mamba) | |
| |
| ``` |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β GPT-OSS 20B (Attention + MoE) β |
| β β Surgery (inject 12 Mamba layers) β |
| β Hybrid: A-A-M-A-A-M-... pattern β |
| β β Phase 1 (train Mamba only) β |
| β Mamba learns to "speak GPT-OSS language" β |
| β β Phase 2 (enable Matryoshka) β |
| β Adaptive compute: 128 β 2880 dim per layer β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| ``` |
| |
| ## Training Status |
| |
| **Phase 1**: Mamba integration (freeze Attention+MoE, train Mamba) |
| |
| ## Usage |
| |
| ```python |
| # Coming soon - inference code |
| # See: https://github.com/unixsysdev/adamba |
| ``` |
| |
| ## License |
| |
| Apache 2.0 (same as GPT-OSS) |
| |