rustformers / llm

[Unmaintained, see README] An ecosystem of Rust libraries for working with large language models

Home Page:https://docs.rs/llm/latest/llm/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Certain quantization levels produce garbage with CUDA acceleration

philpax opened this issue · comments

  • 7B Q3_K_M works (Philpax)
  • 7B Q5_1 does not work (Philpax)
  • 13B Q4_K_M works (Philpax)
  • 13B Q5_K_S does not work (Lukas)
  • 13B Q5_K_M works (Lukas)

My guess is that the execution graph is out of date and needs to be updated to match llama.cpp.

The rope and rope_inplace functions could also lead to error as they now have an additional parameter.