kaihangj commited on
Commit
6c40e0e
·
verified ·
1 Parent(s): 3faeeee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -106,6 +106,12 @@ This model was obtained by quantizing the weights and activations of GLM-5 to NV
106
 
107
  ## Usage
108
 
 
 
 
 
 
 
109
  To serve this checkpoint with [SGLang](https://github.com/sgl-project/sglang), you can start the docker `lmsysorg/sglang:nightly-dev-cu13-20260305-33c92732` and run the sample command below (when the nightly docker becomes unavailable, use `lmsysorg/sglang:latest`):
110
 
111
  ```sh
 
106
 
107
  ## Usage
108
 
109
+ To serve this checkpoint with [vLLM](https://github.com/vllm-project/vllm), you can start the docker `vllm/vllm-openai:latest` and run the sample command below:
110
+
111
+ ```sh
112
+ vllm serve nvidia/GLM-5-NVFP4 --tensor-parallel-size 8 --trust-remote-code --enable-auto-tool-choice --tool-call-parser glm47 --reasoning-parser glm45 --enable-chunked-prefill --max-num-batched-tokens 131072 --gpu-memory-utilization 0.80
113
+ ```
114
+
115
  To serve this checkpoint with [SGLang](https://github.com/sgl-project/sglang), you can start the docker `lmsysorg/sglang:nightly-dev-cu13-20260305-33c92732` and run the sample command below (when the nightly docker becomes unavailable, use `lmsysorg/sglang:latest`):
116
 
117
  ```sh