GPTQ / AWQ

by agahebr - opened Mar 1, 2024

Mar 1, 2024

Hi!

I was wondering if you're planning on adding AWQ/GPTQ support for this model? I'd usually check TheBloke's page, but he seems to be AFK as of recent.

wolfram

Owner Mar 2, 2024

I'll consider it. Problem with AWQ/GPTQ is that the format is less compatible/flexible than GGUF/EXL2 where you can find a quant in exactly the size to work with your VRAM resources.

In production, I use vLLM (the excellent aphrodite-engine fork) for fast parallel inference, but since I only have 48 GB VRAM on my systems, for Miquliz 120B I use EXL2 with Exllamav2 or the new 2-bit GGUF imatrix quants with llama.cpp/KoboldCpp. So don't think AWQ/GPTQ is a good fit for 120B models as of now, and it would take a huge amount of my limited resources. (I miss TheBloke, too!)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment