What is difference between fp8 scaled and mxfp8?

by shivshankar - opened 21 days ago

Discussion

shivshankar

21 days ago

Do 4090 support both?

eepos

21 days ago

MXFP8 is Blackwell only.

Kijai

Comfy Org org 21 days ago

MXFP8 is block wise scaled, specifically with hardware level support in latest NVIDIA GPUs (Blackwell), it's of better quality.

On a 4090 this would mean it has to be dequanted on the fly, making it slower to run, but you will get the quality benefits regardless.

On a 50xx GPU it would be around ~30% faster than bf16.

shivshankar

21 days ago

How quality difference is there? Identical quality?

mingyi456

13 days ago

@shivshankar Mxfp8 uses a block size of 32 (meaning every 32 quantized weights share a single scale factor), while scaled fp8 generally only has a single scale for the entire tensor (so there is 1 scale for maybe about 9 million weights for a 3072x3072 tensor), or a single scale for each tensor row (so maybe 1 scale per 3072 weights). Having more granular scales definitely increases quality.

However, in my personal opinion, I believe that for mxfp8, the quality increase should not be worth the size increase over scaled fp8

easygoing0114

10 days ago

@shivshankar , @mingyi456

I compared the image quality and generation speed of mxfp8 and fp8_scaled on an RTX 4060 Ti.

mxfp8 may not be the best choice for the RTX 4090.

mingyi456

9 days ago

@easygoing0114 Thanks for the work. The fp8 scaled quant in this repo uses a single tensor-wise scale, do you know how to create a row-wise scaled fp8 quant? That should be closer to mxfp8 in quality.

easygoing0114

9 days ago

@mingyi456 Thanks for the explanation. I used the convert_to_quant library to create both fp8_scaled and mxfp8. The fp8_scaled variant is probably tensor-wise, and honestly, I'm not sure how to achieve row-wise scaling with it.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment