Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
nvidia
/
GLM-5.1-NVFP4
like
31
Follow
NVIDIA
58.8k
Text Generation
Safetensors
Model Optimizer
glm_moe_dsa
nvidia
ModelOpt
quantized
4-bit precision
FP4
fp4
conversational
8-bit precision
modelopt
License:
mit
Model card
Files
Files and versions
xet
Community
2
Deploy
Copy to bucket
new
Add FP8 KV cache quantization
#1
by
chenjiel
- opened
7 days ago
base:
refs/heads/main
←
from:
refs/pr/1
Discussion
Files changed
+6
-1
chenjiel
NVIDIA org
7 days ago
No description provided.
Add FP8 KV cache quantization
c1881f79
Update hf_quant_config.json
96ead2f5
chenjiel
changed pull request status to
merged
7 days ago
chenjiel
deleted the
refs/pr/1
ref
7 days ago
Edit
Preview
Upload images, audio, and videos by dragging in the text input, pasting, or
clicking here
.
Tap or paste here to upload images
Comment
·
Sign up
or
log in
to comment