WizardLM inference error: ggml-metal.m:773: false && "not implemented"

Question

WizardLM inference error: ggml-metal.m:773: false && "not implemented"

clarkmcc opened this issue a year ago · comments

I'm getting the following error when trying to run the WizardLM-13B Q8 model. I'm running this library in a tauri app, let me know if you need any more details or testing from me. I'm running Apple M1 Max (64GB).

ggml-sys-8f6d0ee10141006f/out/ggml-metal.m:773: false && "not implemented"
ggml_metal_graph_compute_block_invoke: encoding node 186, op = RMS_NORM

Lukas Kreussel · Answer 1 · Sat Jul 22 2023 01:30:59 GMT+0800 (China Standard Time)

Could you try another quantization format? Maybe q5_1 or one of the K-quants?

Clark McCauley · Answer 2 · Sat Jul 22 2023 02:25:51 GMT+0800 (China Standard Time)

Yeah, so vicuna 7b 2b k-quant works. I can try others if you'd like, this just happens to be one that I have downloaded.

Edit: vicuna 33b 2-bit k-quant also works
Edit: WizardLM 13B 4-bit k-quant does not work

Lukas Kreussel · Answer 3 · Sat Jul 22 2023 03:10:19 GMT+0800 (China Standard Time)

The error seams to be caused by this codeblock in the ggml metal shader implementation. We probably have to pull the latest changes into our repo or we have to check if our way of embedding the shader code into the ggml-metal.m file creates some issues. Probably something for @philpax.

@clarkmcc Could you check if these models run on the llama.cpp main branch? We use it as our current ggml source and if the error is in the shader code we have to create an issue there.

Clark McCauley · Answer 4 · Sat Jul 22 2023 07:30:30 GMT+0800 (China Standard Time)

@LLukas22 Running the following command seems to work just fine for me

./main -m ~/models/wizardlm-13b-v1.1.ggmlv3.q4_K_M.bin -n 128 -ngl 1 -p "The meaning of life is "

Philpax · Answer 5 · Sat Jul 22 2023 21:44:35 GMT+0800 (China Standard Time)

We probably need to update our implementation of the LLaMA model. Not sure if I'll be able to get around to that soon.

Lukas Kreussel · Answer 6 · Fri Aug 04 2023 17:10:29 GMT+0800 (China Standard Time)

According to ggerganov/llama.cpp#2508 some quantizatio90ns are simply not implemented in metal.