ik_llama.cpp compatible quants of MiMo-V2.5-Pro

ik_llama.cpp doesn't support fused attn_qkv These are converted from mainline quants.

Tested with the main branch of ik_llama.cpp

  • Un-fused attn_qkv.weight -> attn_q.weight, attn_k.weight, attn_v.weight
  • Dropped MTP tensors

Tested and working with the latest ik_llama.cpp

MiMo-V2.5-Pro-IQ2_XXS-unfused.gguf - converted from bartowski/MiMo-V2.5-Pro-GGUF

MiMo-V2.5-Pro-IQ2_S-unfused.gguf - converted from AesSedai/MiMo-V2.5-Pro-GGUF

MiMo-V2.5-Pro-IQ3_S-unfused.gguf - converted from AesSedai/MiMo-V2.5-Pro-GGUF

architecture: mimo2
q_size=24576  head_dim=192  v_head_dim=128  layers=73  kv_heads=array  mtp=3  effective_layers=70
split 70 fused qkv tensors, dropped 36 mtp tensors

Perplexity Tests

iq2_s

Final estimate: PPL over 584 chunks for n_ctx=512 = 4.0549 +/- 0.02288

llama_print_timings:        load time =  191864.40 ms
llama_print_timings:      sample time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time = 4322704.22 ms / 299008 tokens (   14.46 ms per token,    69.17 tokens per second)
llama_print_timings:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time = 4326930.78 ms / 299009 tokens

iq2_xxs

Final estimate: PPL over 584 chunks for n_ctx=512 = 4.5954 +/- 0.02684

llama_print_timings:        load time =  149084.44 ms
llama_print_timings:      sample time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time = 4160557.52 ms / 299008 tokens (   13.91 ms per token,    71.87 tokens per second)
llama_print_timings:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time = 4164868.84 ms / 299009 tokens
Downloads last month
348
GGUF
Model size
1T params
Architecture
mimo2
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gghfez/MiMo-V2.5-Pro-unfused-test

Quantized
(7)
this model