How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "cmarkea/Qwen2.5-32B-Instruct-4bit"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cmarkea/Qwen2.5-32B-Instruct-4bit",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'
Use Docker
docker model run hf.co/cmarkea/Qwen2.5-32B-Instruct-4bit
Quick Links

Converted version of Qwen2.5-32B-Instruct to 4-bit using bitsandbytes. For more information about the model, refer to the model's page.

Impact on performance

Impact of quantization on a set of models.

Evaluation of the model was conducted using the PoLL (Pool of LLM) technique, assessing performance on 100 French questions with scores aggregated from six evaluations (two per evaluator). The evaluators included GPT-4o, Gemini-1.5-pro, and Claude3.5-sonnet.

Performance Scores (on a scale of 5):

Model Score # params (Billion) size (GB)
gpt-4o 4.13 N/A N/A
gpt-4o-mini 4.02 N/A N/A
Qwen/Qwen2.5-32B-Instruct 3.99 32.8 65.6
cmarkea/Qwen2.5-32B-Instruct-4bit 3.98 32.8 16.4
mistralai/Mixtral-8x7B-Instruct-v0.1 3.71 46.7 93.4
cmarkea/Mixtral-8x7B-Instruct-v0.1-4bit 3.68 46.7 23.35
meta-llama/Meta-Llama-3.1-70B-Instruct 3.68 70.06 140.12
gpt-3.5-turbo 3.66 175 350
cmarkea/Meta-Llama-3.1-70B-Instruct-4bit 3.64 70.06 35.3
TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ 3.56 46.7 46.7
meta-llama/Meta-Llama-3.1-8B-Instruct 3.25 8.03 16.06
mistralai/Mistral-7B-Instruct-v0.2 1.98 7.25 14.5
cmarkea/bloomz-7b1-mt-sft-chat 1.69 7.07 14.14
cmarkea/bloomz-3b-dpo-chat 1.68 3 6
cmarkea/bloomz-3b-sft-chat 1.51 3 6
croissantllm/CroissantLLMChat-v0.1 1.19 1.3 2.7
cmarkea/bloomz-560m-sft-chat 1.04 0.56 1.12
OpenLLM-France/Claire-Mistral-7B-0.1 0.38 7.25 14.5

The impact of quantization is negligible.

Prompt Pattern

Here is a reminder of the command pattern to interact with the model:

<|im_start|>user\n{user_prompt_1}<|im_end|>\n<|im_start|>assistant\n{model_answer_1}...
Downloads last month
34
Safetensors
Model size
34B params
Tensor type
F32
·
BF16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cmarkea/Qwen2.5-32B-Instruct-4bit

Base model

Qwen/Qwen2.5-32B
Quantized
(139)
this model

Collection including cmarkea/Qwen2.5-32B-Instruct-4bit