llm.create_chat_completion(
messages = "No input example has been defined for this model task."
)Not even close to an ideal quantization. I just requantized am17an/Qwen3.6-35BA3B-MTP-GGUF (which is Q8_0) to Q4_0.
Made using this llama.cpp PR
Flags I use for inference (25tk/s to 37tk/s | 1,5x~ speedup):
./llama.cpp/build/bin/llama-server --host 0.0.0.0 --port 5000 \
-m ./Models/Qwen3.6-35BA3B-MTP-Q4_0.gguf --fit on \
-a "Qwen3.6-35B-A3B" -c 200000 \
--top-k 20 --top-p 0.95 --min-p 0 --repeat-penalty 1.0 --presence-penalty 0.0 \
-fa on --temp 0.6 -ctk q8_0 -ctv q8_0 --batch-size 4096 \
--reasoning off --no-mmap --jinja -t 6 --checkpoint-every-n-tokens 4096 --ctx-checkpoints 64 -fitt 512 \
--spec-draft-n-max 2
- Downloads last month
- 1,008
Hardware compatibility
Log In to add your hardware
4-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for bombdefuser-124/Qwen3.6-35BA3B-MTP-Q4_0-GGUF
Base model
am17an/Qwen3.6-35BA3B-MTP-GGUF
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="bombdefuser-124/Qwen3.6-35BA3B-MTP-Q4_0-GGUF", filename="Qwen3.6-35BA3B-MTP-Q4_0.gguf", )