openai/gsm8k
Benchmark • Updated • 17.6k • 952k • 1.33k
How to use MinimaML/SaRDinE-14B8x4P with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="MinimaML/SaRDinE-14B8x4P", trust_remote_code=True) # Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("MinimaML/SaRDinE-14B8x4P", trust_remote_code=True, dtype="auto")How to use MinimaML/SaRDinE-14B8x4P with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MinimaML/SaRDinE-14B8x4P"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "MinimaML/SaRDinE-14B8x4P",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/MinimaML/SaRDinE-14B8x4P
How to use MinimaML/SaRDinE-14B8x4P with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "MinimaML/SaRDinE-14B8x4P" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "MinimaML/SaRDinE-14B8x4P",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "MinimaML/SaRDinE-14B8x4P" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "MinimaML/SaRDinE-14B8x4P",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use MinimaML/SaRDinE-14B8x4P with Docker Model Runner:
docker model run hf.co/MinimaML/SaRDinE-14B8x4P
Sparse Routed Delta Experts on Mistral-14B-Reasoning.
14B base params | 8 experts per layer | ~4% sparsity (alpha)
SaRDinE is a novel MoE-alternative architecture. Unlike traditional MoE which fragments model capacity across experts, SaRDinE:
Result: Full base model capability PLUS domain specialization.
import torch
from transformers import AutoTokenizer
# Clone the repo for the model code
# The model uses trust_remote_code=True
# Load tokenizer from base model
tokenizer = AutoTokenizer.from_pretrained(
"mistralai/Ministral-3-14B-Reasoning-2512",
trust_remote_code=True
)
# Load SRDE model
from huggingface_hub import hf_hub_download
import sys
sys.path.insert(0, hf_hub_download("MinimaML/SaRDinE-14B8x4P", "modeling_sardine.py", local_dir="."))
from modeling_sardine import SaRDinEForCausalLM
model = SaRDinEForCausalLM.from_pretrained(
"MinimaML/SaRDinE-14B8x4P",
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Generate
prompt = "Solve step by step: What is 15% of 80?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
import torch
from transformers import AutoTokenizer, BitsAndBytesConfig
# Quantization config for base model
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4"
)
# Load with quantization
from modeling_sardine import SaRDinEForCausalLM
model = SaRDinEForCausalLM.from_pretrained(
"MinimaML/SaRDinE-14B8x4P",
quantization_config=bnb_config,
device_map="auto"
)
| Component | Value |
|---|---|
| Base Model | Mistral-14B-Reasoning (frozen) |
| Trainable Parameters | ~2.4B (sparse deltas) |
| Experts per Layer | 8 |
| Top-K Routing | 2 |
| Current Sparsity | ~4% |
| Augmented Layers | 40 |
| Domain | Data Sources |
|---|---|
| Math | GSM8K, MetaMathQA, Orca-Math |
| Logic | BigBench, CommonsenseQA, HellaSwag |
| Code | MBPP, HumanEval, CodeFeedback |
| Science | SciQ, ARC, MMLU |
| Planning | HotpotQA, SCROLLS |
| Abstract | BigBench, AQuA-RAT, Winogrande |
srde_weights.pt - Trained SRDE delta weights (~5GB)modeling_sardine.py - Model architectureconfiguration_sardine.py - Configuration classconfig.json - Model config@misc{sardine2025,
title={SaRDinE: Sparse Routed Delta Experts},
author={MinimaML},
year={2025},
url={https://github.com/MinimaML/srde-mistral}
}
Apache 2.0
Base model
mistralai/Ministral-3-14B-Base-2512