Not even close to an ideal quantization. I just requantized am17an/Qwen3.6-35BA3B-MTP-GGUF (which is Q8_0) to Q4_0.

Made using this llama.cpp PR

Flags I use for inference (25tk/s to 37tk/s | 1,5x~ speedup):

./llama.cpp/build/bin/llama-server --host 0.0.0.0 --port 5000 \
                  -m ./Models/Qwen3.6-35BA3B-MTP-Q4_0.gguf --fit on \
                  -a "Qwen3.6-35B-A3B" -c 200000 \
                  --top-k 20 --top-p 0.95 --min-p 0 --repeat-penalty 1.0 --presence-penalty 0.0 \
                  -fa on --temp 0.6 -ctk q8_0 -ctv q8_0 --batch-size 4096 \
                  --reasoning off --no-mmap --jinja -t 6 --checkpoint-every-n-tokens 4096 --ctx-checkpoints 64 -fitt 512 \
                  --spec-draft-n-max 2
Downloads last month
974
GGUF
Model size
36B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for bombdefuser-124/Qwen3.6-35BA3B-MTP-Q4_0-GGUF

Quantized
(2)
this model