rustformers / llm

[Unmaintained, see README] An ecosystem of Rust libraries for working with large language models

Home Page:https://docs.rs/llm/latest/llm/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WizardLM inference error: ggml-metal.m:773: false && "not implemented"

clarkmcc opened this issue · comments

I'm getting the following error when trying to run the WizardLM-13B Q8 model. I'm running this library in a tauri app, let me know if you need any more details or testing from me. I'm running Apple M1 Max (64GB).

ggml-sys-8f6d0ee10141006f/out/ggml-metal.m:773: false && "not implemented"
ggml_metal_graph_compute_block_invoke: encoding node 186, op = RMS_NORM

Could you try another quantization format? Maybe q5_1 or one of the K-quants?

Yeah, so vicuna 7b 2b k-quant works. I can try others if you'd like, this just happens to be one that I have downloaded.

Edit: vicuna 33b 2-bit k-quant also works
Edit: WizardLM 13B 4-bit k-quant does not work

The error seams to be caused by this codeblock in the ggml metal shader implementation. We probably have to pull the latest changes into our repo or we have to check if our way of embedding the shader code into the ggml-metal.m file creates some issues. Probably something for @philpax.

@clarkmcc Could you check if these models run on the llama.cpp main branch? We use it as our current ggml source and if the error is in the shader code we have to create an issue there.

@LLukas22 Running the following command seems to work just fine for me

./main -m ~/models/wizardlm-13b-v1.1.ggmlv3.q4_K_M.bin -n 128 -ngl 1 -p "The meaning of life is "

We probably need to update our implementation of the LLaMA model. Not sure if I'll be able to get around to that soon.

According to ggerganov/llama.cpp#2508 some quantizatio90ns are simply not implemented in metal.