MiniMax-M2.7 (MLX)
Collection
RAM quantized versions of MiniMaxAI/MiniMax-M2.7 for Apple Silicon. Size points: 100 GB, 111 GB, 116 GB, 120 GB. • 4 items • Updated
Mixed-precision MLX build of MiniMaxAI/MiniMax-M2.7, prepared by baa.ai.
| Metric | Value |
|---|---|
| Size on disk | 100.1 GB (20 shards) |
| Group size | 64 |
| Framework | MLX (Apple Silicon) |
sampler_params = {
"temperature": 1.0,
"top_p": 0.95,
"top_k": 40,
"repetition_penalty": 1.1,
"max_tokens": 8192,
}
MiniMax-M2.7 uses a <think>…</think> reasoning block. Important: the base chat template injects <think>\n at the end of the prompt before generation, so the model's output begins inside the reasoning block with no opening tag. Strip everything up to and including the first </think>:
def strip_thinking(text: str) -> str:
if "</think>" in text:
return text.split("</think>", 1)[1].strip()
return text.strip()
Give the model enough token budget that it can finish reasoning and emit the </think> closing tag — we recommend at least 4096, and 8192 for harder problems.
from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler, make_logits_processors
model, tokenizer = load("baa-ai/MiniMax-M2.7-RAM-100GB-MLX")
sampler = make_sampler(temp=1.0, top_p=0.95, top_k=40)
logits_processors = make_logits_processors(repetition_penalty=1.1)
prompt = tokenizer.apply_chat_template(
[{"role": "user", "content": "Write a Python function that reverses a string."}],
tokenize=False,
add_generation_prompt=True,
)
response = generate(
model,
tokenizer,
prompt=prompt,
max_tokens=8192,
sampler=sampler,
logits_processors=logits_processors,
)
if "</think>" in response:
response = response.split("</think>", 1)[1].strip()
print(response)
| Variant | Size | Link |
|---|---|---|
| 100 GB | 100.1 GB | baa-ai/MiniMax-M2.7-RAM-100GB-MLX |
| 111 GB | 110.9 GB | baa-ai/MiniMax-M2.7-RAM-111GB-MLX |
| 116 GB | 116.0 GB | baa-ai/MiniMax-M2.7-RAM-116GB-MLX |
| 120 GB | 120.1 GB | baa-ai/MiniMax-M2.7-RAM-120GB-MLX |
Inherited from the upstream MiniMax-M2.7 license: non-commercial use permitted; commercial use requires written authorization from MiniMax.
Quantized by baa.ai
4-bit
Base model
MiniMaxAI/MiniMax-M2.7