llama-2-70b.ggmlv3.q4_K_S.bin : wrong shape errors

by mikeee - opened Jul 27, 2023

Jul 27, 2023

error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected  8192 x  8192, got  8192 x1024
llama_init_from_file: failed to load model
Traceback (most recent call last):
  File "/home/mu2018/github/llama2-7b-chat-ggml/app.py", line 126, in <module>
    LLM = AutoModelForCausalLM.from_pretrained(
  File "/home/mu2018/github/llama2-7b-chat-ggml/.venv/lib/python3.10/site-packages/ctransformers/hub.py", line 157, in from_pretrained
    return LLM(
  File "/home/mu2018/github/llama2-7b-chat-ggml/.venv/lib/python3.10/site-packages/ctransformers/llm.py", line 214, in __init__
    raise RuntimeError(
RuntimeError: Failed to create LLM 'llama' from 'models/llama-2-70b.ggmlv3.q4_K_S.bin'.

Thanks very much for providing all these ggml models. Really awesome!

I tried some of the llama-2-7b, llama-2-13b ggml models and all run without a problem.

llama-2-70b.ggmlv3.q4_K_S.bin however seems to have some problem. I tried it with Python 3.10.6 and ctransformers in Ubuntu 22. I'll probably give another try with llama-2-70b.ggmlv3.q4_0.bin.

mikeee

Jul 29, 2023

This is what I fonud out and the way I understand it: the 70b model is slightly different, needs to set -gqa 8. But ctransformers 0.2.15 is able to handle 70b model now.

mikeee changed discussion status to closed Jul 29, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment