RuntimeError: cutlassF: no kernel found to launch

by Manmax31 - opened Dec 19, 2023

Discussion

Manmax31

Dec 19, 2023

•

edited Dec 19, 2023

I attempted finetuning using DeciLM-7B as the model using PEFT and SFT.

After the peft model is created, during inference, I get the error RuntimeError: cutlassF: no kernel found to launch!

I am using TeslaV100 GPUs. I have tried the same with Mistral-7b and don't see any issues.

Here is my code:

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
)
base_model_id="Deci/DeciLM-7B"
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    quantization_config=quantization_config,
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

ft_model = PeftModel.from_pretrained(
    model, "finetuned_models/deci-7b-tuned-r32-a16"
)
pipe = pipeline("text-generation", model=ft_model, tokenizer=tokenizer)

outputs = pipe(
    prompt,
    max_new_tokens=800,
    do_sample=True,
    temperature=0.1,
    return_full_text=False,
    repetition_penalty=1.5,
    num_beams=1,
    no_repeat_ngram_size=3,
    early_stopping=True,
)
print("\nOutput:", outputs[0]["generated_text"])

NajeebDeci

Dec 19, 2023

Hello,

This is because SDPA is not supported in your environment, this will likely be fixed in a future transformer version as the new version checks if SDPA is available.

Hopefully, that helps.

Manmax31

Dec 19, 2023

Thank you.
Is there a work around this for now?

vcoolish

Jan 22, 2024

try on A100 GPU, it worked for me

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment