DeepSeek V4 Flash MLX Q3 Mixed

This is an MLX conversion of deepseek-ai/DeepSeek-V4-Flash.

Source

  • Base model: deepseek-ai/DeepSeek-V4-Flash
  • Source revision: 6e763230a9d263eca2023f1d4a5ce1bfe126cf48
  • Architecture: DeepseekV4ForCausalLM
  • Model type: deepseek_v4

Conversion Recipe

  • Tooling branch: Thump604/mlx-lm, branch deepseek-v4-support-fixes
  • Minimum tooling commit for generation: 9c990f4
  • Output path during conversion: /Volumes/Lexar/mlx_models/DeepSeek-V4-Flash-MLX-Q3-mixed-gs128-affine
  • Quantization recipe: mixed_3_6
  • Quantization mode: affine
  • Group size: 128
  • Effective bits per weight reported by MLX: 3.808
  • Shards: 28
  • Indexed MLX tensor size: 135,346,422,876 bytes

The mixed recipe uses 3-bit affine quantization for lower-risk routed expert paths and 6-bit affine quantization for sensitive paths including embeddings, LM head, attention projections, compressed-attention/indexer components, shared experts, and selected down projections.

Validation

  • Conversion completed successfully.
  • Lazy MLX load completed successfully on a 128GB Mac Studio.
  • One-token generation smoke was attempted and stopped after memory pressure and swap activity exceeded the local safety boundary. Treat this artifact as converted and load-validated, not generation-qualified on 128GB Apple Silicon.

Notes

DeepSeek V4 support in MLX is still under active development. This artifact was produced with local DeepSeek V4 support fixes, including FP4/FP8 checkpoint handling, F8_E8M0 scale metadata reinterpretation as raw uint8 exponent bytes before sanitizer decode, attention sink dtype handling, and quantized grouped output projection support.

Downloads last month
8,716
Safetensors
Model size
284B params
Tensor type
BF16
·
U32
·
F32
·
I64
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Thump604/DeepSeek-V4-Flash-MLX-Q3-mixed-gs128-affine

Quantized
(33)
this model