rustformers / llm

[Unmaintained, see README] An ecosystem of Rust libraries for working with large language models

https://docs.rs/llm/latest/llm/

Certain quantization levels produce garbage with CUDA acceleration

philpax opened this issue a year ago · comments

Philpax commented a year ago

7B Q3_K_M works (Philpax)
7B Q5_1 does not work (Philpax)
13B Q4_K_M works (Philpax)
13B Q5_K_S does not work (Lukas)
13B Q5_K_M works (Lukas)

My guess is that the execution graph is out of date and needs to be updated to match llama.cpp.

Lukas Kreussel commented a year ago

The rope and rope_inplace functions could also lead to error as they now have an additional parameter.