Instructions to use wolfram/miquliz-120b-v2.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use wolfram/miquliz-120b-v2.0 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("wolfram/miquliz-120b-v2.0", dtype="auto") - Notebooks
- Google Colab
- Kaggle
GPTQ / AWQ
Hi!
I was wondering if you're planning on adding AWQ/GPTQ support for this model? I'd usually check TheBloke's page, but he seems to be AFK as of recent.
I'll consider it. Problem with AWQ/GPTQ is that the format is less compatible/flexible than GGUF/EXL2 where you can find a quant in exactly the size to work with your VRAM resources.
In production, I use vLLM (the excellent aphrodite-engine fork) for fast parallel inference, but since I only have 48 GB VRAM on my systems, for Miquliz 120B I use EXL2 with Exllamav2 or the new 2-bit GGUF imatrix quants with llama.cpp/KoboldCpp. So don't think AWQ/GPTQ is a good fit for 120B models as of now, and it would take a huge amount of my limited resources. (I miss TheBloke, too!)