FP8 Quantization?

by MoRanYue - opened 1 day ago

•

10 GB may be too large to inference in low-vRAM GPU, will there be a mixed FP8 and even FP4 quantization?

Thanks for raising this! We haven’t tested model quantization yet, but we’ll consider providing a quantized version once its stability is confirmed.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment