Instructions to use zai-org/GLM-5.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use zai-org/GLM-5.1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="zai-org/GLM-5.1") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("zai-org/GLM-5.1") model = AutoModelForCausalLM.from_pretrained("zai-org/GLM-5.1") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- HuggingChat
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use zai-org/GLM-5.1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "zai-org/GLM-5.1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zai-org/GLM-5.1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/zai-org/GLM-5.1
- SGLang
How to use zai-org/GLM-5.1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "zai-org/GLM-5.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zai-org/GLM-5.1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "zai-org/GLM-5.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zai-org/GLM-5.1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use zai-org/GLM-5.1 with Docker Model Runner:
docker model run hf.co/zai-org/GLM-5.1
VLLM chat template / tool calling issue
I was having trouble with tool calling in VLLM 0.19 - "no tool calling output detected".
When serving via vLLM's OpenAI-compatible endpoint, tool role messages have their content field passed as a list of content blocks (e.g. [{"type": "text", "text": "..."}]) rather than a plain string.
This falls through to the branch that re-renders tool schemas via tool_to_json(), so the model sees signatures instead of the actual tool output.
Also, visible_text() doesn't guard against None content, which can show as the literal string "None" in the final prompt.
This patch fixes it for me, not sure if it will be helpful to others:
--- <unnamed>
+++ <unnamed>
@@ -32,7 +32,8 @@
For each function call, output the function name and arguments within the following XML format:
<tool_call>{function-name}<arg_key>{arg-key-1}</arg_key><arg_value>{arg-value-1}</arg_value><arg_key>{arg-key-2}</arg_key><arg_value>{arg-value-2}</arg_value>...</tool_call>{%- endif -%}
{%- macro visible_text(content) -%}
- {%- if content is string -%}
+ {%- if content is none -%}
+ {%- elif content is string -%}
{{- content }}
{%- elif content is iterable and content is not mapping -%}
{%- for item in content -%}
@@ -94,20 +95,32 @@
{%- endif %}
{%- if m.content is string -%}
{{- '<tool_response>' + m.content + '</tool_response>' -}}
+{%- elif m.content is iterable and m.content is not mapping -%}
+ {%- set ns_tool_content = namespace(text='') -%}
+ {%- for part in m.content -%}
+ {%- if part is mapping and part.type == 'text' -%}
+ {%- set ns_tool_content.text = ns_tool_content.text + part.text -%}
+ {%- endif -%}
+ {%- endfor -%}
+ {%- if ns_tool_content.text -%}
+ {{- '<tool_response>' + ns_tool_content.text + '</tool_response>' -}}
+ {%- else -%}
+ {{- '<tool_response><tools>\n' -}}
+ {%- for tr in m.content -%}
+ {%- for tool in tools -%}
+ {%- if 'function' in tool -%}
+ {%- set tool = tool['function'] -%}
+ {%- endif -%}
+ {%- if tool.name == tr.name -%}
+ {{- tool_to_json(tool) + '\n' -}}
+ {%- endif -%}
+ {%- endfor -%}
+ {%- endfor -%}
+ {{- '</tools></tool_response>' -}}
+ {%- endif -%}
{%- else -%}
- {{- '<tool_response><tools>\n' -}}
- {% for tr in m.content %}
- {%- for tool in tools -%}
- {%- if 'function' in tool -%}
- {%- set tool = tool['function'] -%}
- {%- endif -%}
- {%- if tool.name == tr.name -%}
- {{- tool_to_json(tool) + '\n' -}}
- {%- endif -%}
- {%- endfor -%}
- {%- endfor -%}
- {{- '</tools></tool_response>' -}}
-{% endif -%}
+ {{- '<tool_response>' + m.content | string + '</tool_response>' -}}
+{%- endif -%}
{%- elif m.role == 'system' -%}
<|system|>{{ visible_text(m.content) }}
{%- endif -%}
thanks, working on it
https://github.com/vllm-project/vllm/pull/39253 check this PR
use vllm's main branch to serve GLM-5.1 with MTP; then add --chat-template-content-format=string at launch command to avoid chat-template related issue. Thanks!