Instructions to use zai-org/GLM-5.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use zai-org/GLM-5.1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="zai-org/GLM-5.1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("zai-org/GLM-5.1")
model = AutoModelForCausalLM.from_pretrained("zai-org/GLM-5.1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use zai-org/GLM-5.1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "zai-org/GLM-5.1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zai-org/GLM-5.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/zai-org/GLM-5.1

SGLang

How to use zai-org/GLM-5.1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "zai-org/GLM-5.1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zai-org/GLM-5.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "zai-org/GLM-5.1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zai-org/GLM-5.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use zai-org/GLM-5.1 with Docker Model Runner:
```
docker model run hf.co/zai-org/GLM-5.1
```

VLLM chat template / tool calling issue

#18

by skyqu - opened Apr 12

Discussion

skyqu

Apr 12

I was having trouble with tool calling in VLLM 0.19 - "no tool calling output detected".
When serving via vLLM's OpenAI-compatible endpoint, tool role messages have their content field passed as a list of content blocks (e.g. [{"type": "text", "text": "..."}]) rather than a plain string.
This falls through to the branch that re-renders tool schemas via tool_to_json(), so the model sees signatures instead of the actual tool output.
Also, visible_text() doesn't guard against None content, which can show as the literal string "None" in the final prompt.

This patch fixes it for me, not sure if it will be helpful to others:

--- <unnamed>
+++ <unnamed>
@@ -32,7 +32,8 @@
 For each function call, output the function name and arguments within the following XML format:
 <tool_call>{function-name}<arg_key>{arg-key-1}</arg_key><arg_value>{arg-value-1}</arg_value><arg_key>{arg-key-2}</arg_key><arg_value>{arg-value-2}</arg_value>...</tool_call>{%- endif -%}
 {%- macro visible_text(content) -%}
-    {%- if content is string -%}
+    {%- if content is none -%}
+    {%- elif content is string -%}
         {{- content }}
     {%- elif content is iterable and content is not mapping -%}
         {%- for item in content -%}
@@ -94,20 +95,32 @@
 {%- endif %}
 {%- if m.content is string -%}
     {{- '<tool_response>' + m.content + '</tool_response>' -}}
+{%- elif m.content is iterable and m.content is not mapping -%}
+    {%- set ns_tool_content = namespace(text='') -%}
+    {%- for part in m.content -%}
+        {%- if part is mapping and part.type == 'text' -%}
+            {%- set ns_tool_content.text = ns_tool_content.text + part.text -%}
+        {%- endif -%}
+    {%- endfor -%}
+    {%- if ns_tool_content.text -%}
+        {{- '<tool_response>' + ns_tool_content.text + '</tool_response>' -}}
+    {%- else -%}
+        {{- '<tool_response><tools>\n' -}}
+        {%- for tr in m.content -%}
+            {%- for tool in tools -%}
+                {%- if 'function' in tool -%}
+                    {%- set tool = tool['function'] -%}
+                {%- endif -%}
+                {%- if tool.name == tr.name -%}
+                    {{- tool_to_json(tool) + '\n' -}}
+                {%- endif -%}
+            {%- endfor -%}
+        {%- endfor -%}
+        {{- '</tools></tool_response>' -}}
+    {%- endif -%}
 {%- else -%}
-    {{- '<tool_response><tools>\n' -}}
-    {% for tr in m.content %}
-        {%- for tool in tools -%}
-            {%- if 'function' in tool -%}
-                {%- set tool = tool['function'] -%}
-            {%- endif -%}
-            {%- if tool.name == tr.name -%}
-                {{- tool_to_json(tool) + '\n' -}}
-            {%- endif -%}
-        {%- endfor -%}
-    {%- endfor -%}
-    {{- '</tools></tool_response>' -}}
-{% endif -%}
+    {{- '<tool_response>' + m.content | string + '</tool_response>' -}}
+{%- endif -%}
 {%- elif m.role == 'system' -%}
 <|system|>{{ visible_text(m.content) }}
 {%- endif -%}

skyqu changed discussion title from VLLM chat template issue to VLLM chat template / tool calling issue Apr 12

JaredforReal

Apr 13

thanks, working on it

JaredforReal

Apr 13

https://github.com/vllm-project/vllm/pull/39253 check this PR
use vllm's main branch to serve GLM-5.1 with MTP; then add --chat-template-content-format=string at launch command to avoid chat-template related issue. Thanks!

skyqu changed discussion status to closed Apr 13

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment