nvidia
/

GLM-5.1-NVFP4

Text Generation

Model Optimizer

4-bit precision

8-bit precision

Model card Files Files and versions

Add FP8 KV cache quantization

#1

by chenjiel - opened 7 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

NVIDIA org 7 days ago

No description provided.

Add FP8 KV cache quantizationc1881f79

Update hf_quant_config.json96ead2f5

chenjiel changed pull request status to merged 7 days ago

chenjiel deleted the refs/pr/1 ref 7 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment