Has the NVFP4 version of the model undergone Quantization-Aware Training (QAT)?

#2
by Jianqiao1 - opened

It appears you have already performed PTQ; is it possible to implement Quantization-Aware Training (QAT) specifically for the NVFP4 format? While the main branch of llama.cpp now supports NVFP4, its KLD remains inferior to that of the traditional Q4_K_M format. If QAT could be applied to bring the model's accuracy close to its original full-precision performance, this model would be virtually unrivaled within its size class.

Upon its release, DeepSeek V4 Flash was already designed with mixed FP4+FP8 precision. Consequently, many of its GGUF conversions actually end up being larger than the original safetensors models; furthermore, the accuracy of versions suitable for single-machine deployment—such as ds4.c—suffers significantly, sometimes performing even worse than Qwen 3.6 27B.

Sign up or log in to comment