Image-Text-to-Text
MLX
Safetensors
English
mistral3
mistral
Mixture of Experts
lean4
vision
conversational
4-bit precision
Instructions to use mvid/Leanstral-2603-MLX-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mvid/Leanstral-2603-MLX-4bit with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("mvid/Leanstral-2603-MLX-4bit") config = load_config("mvid/Leanstral-2603-MLX-4bit") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi
How to use mvid/Leanstral-2603-MLX-4bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "mvid/Leanstral-2603-MLX-4bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "mvid/Leanstral-2603-MLX-4bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use mvid/Leanstral-2603-MLX-4bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "mvid/Leanstral-2603-MLX-4bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default mvid/Leanstral-2603-MLX-4bit
Run Hermes
hermes
Leanstral-2603 MLX 4-bit
MLX 4-bit quantization of mistralai/Leanstral-2603.
- Architecture: Mistral-Small-4 — 119B total / 6.5B active per token, MoE (128 experts, 4 active), MLA attention, YARN + Llama-4 RoPE scaling, Pixtral vision encoder
- Quantization: 4-bit affine,
group_size=64, 4.594 bits/weight averagemlp.gate(router) kept at 8-bit per layerlm_headand the vision tower / multimodal projector kept at full precision
- Size: ~64 GB
- Format: MLX safetensors (Apple Silicon)
- Recommended hardware: Apple Silicon with >= 80 GB unified memory (M3 Ultra, M5 with 128 GB, etc.)
Usage
from mlx_vlm import load, generate
model, processor = load("mvid/Leanstral-2603-MLX-4bit")
output = generate(
model,
processor,
"Prove that the sum of two even numbers is even in Lean 4.",
max_tokens=4096,
)
print(output)
Conversion
Produced from the Mistral consolidated format via:
- A streaming variant of
transformers/models/mistral4/convert_mistral4_weight_to_hf.pythat fuses MoE experts layer-by-layer instead of holding the full state dict in RAM, keeping FP8 storage in the HF intermediate (peak memory ~14 GB). mlx_vlm convert -q --q-bits 4 --q-group-size 64 --dtype bfloat16with a per-tensormx.eval+mx.synchronizesave patch to avoid macOS Metal command-buffer watchdog timeouts on the 119B model.
License: Apache 2.0 (inherited from the base model).
- Downloads last month
- 61
Model size
19B params
Tensor type
BF16
·
U32 ·
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for mvid/Leanstral-2603-MLX-4bit
Base model
mistralai/Leanstral-2603