How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="CodeFault/Mellum2-12B-A2.5B-Instruct-GGUF",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Mellum2-12B-A2.5B-Instruct - GGUF

Quantized GGUF version of JetBrains/Mellum2-12B-A2.5B-Instruct. These were generated using the default settings with llama-quantize (b9482).

Quantizations provided

File Quantization Size
Mellum2-12B-A2.5B-Instruct-Q4_0.gguf Q4_01 6.91 GB
Mellum2-12B-A2.5B-Instruct-Q4_K_S.gguf Q4_K_S 7.4 GB
Mellum2-12B-A2.5B-Instruct-Q4_K_M.gguf Q4_K_M 8.07 GB
Mellum2-12B-A2.5B-Instruct-Q5_K_M.gguf Q5_K_M 9.21 GB
Mellum2-12B-A2.5B-Instruct-Q6_K.gguf Q6_K 10.9 GB
Mellum2-12B-A2.5B-Instruct-Q8_0.gguf Q8_0 12.9 GB

1: Q4_0 is not recommended. Perplexity increased significantly which suggests degredated quality. I did not encounter endlessly repeating tokens like with the thinking variation at Q4_0.

Perplexity test

I tested perplexity using llama-perplexity and Salesforce's wikitext-2-raw-v1.

File Quantization Ctx PPL
Mellum2-12B-A2.5B-Instruct-Q4_0.gguf Q4_0 512 180.4777 +/- 3.04078
Mellum2-12B-A2.5B-Instruct-Q4_K_S.gguf Q4_K_S 512 14.1798 +/- 0.13170
Mellum2-12B-A2.5B-Instruct-Q4_K_M.gguf Q4_K_M 512 14.0943 +/- 0.13090
Mellum2-12B-A2.5B-Instruct-Q5_K_M.gguf Q5_K_M 512 14.6015 +/- 0.13815
Mellum2-12B-A2.5B-Instruct-Q6_K.gguf Q6_K 512 17.7939 +/- 0.18152
Mellum2-12B-A2.5B-Instruct-Q8_0.gguf Q8_0 512 13.2165 +/- 0.12008
Mellum2-12B-A2.5B-Instruct-BF16.gguf BF16 512 14.7864 +/- 0.14076

Serving with llama.cpp

llama.cpp added support for Mellum2 in release b9482. It has a max context size of 131,072. This can be served using:

llama-server \
  -hf CodeFault/Mellum2-12B-A2.5B-Instruct-GGUF:Q5_K_M \
  --temp 0.6 \
  --top-p 0.95 \
  --top-k 20
Downloads last month
1,733
GGUF
Model size
12B params
Architecture
mellum
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for CodeFault/Mellum2-12B-A2.5B-Instruct-GGUF

Quantized
(15)
this model