FP8 Quantization?
#1
by MoRanYue - opened
10 GB may be too large to inference in low-vRAM GPU, will there be a mixed FP8 and even FP4 quantization?
Thanks for raising this! We haven’t tested model quantization yet, but we’ll consider providing a quantized version once its stability is confirmed.