[load_gguf] gguf_tensor_to_f16 failed

Question

DenisSergeevitch opened this issue 3 months ago · comments

I'm trying to launch already quantized models that are working with llama.cpp, but it does not work with SiLLM.

Am I missing something, and SiLLM only works with FP16 models?

Models tried:

WizardLM-2-8x22B.Q2_K-00001-of-00005.gguf
ggml-c4ai-command-r-plus-104b-iq2_m.gguf
phi-2-orange-v2.Q8_0.gguf

Armin Buescher · Answer 1 · Tue Apr 16 2024 13:27:23 GMT+0800 (China Standard Time)

GGUF support in SiLLM via MLX is currently limited to quantizations Q4_0, Q4_1 and Q8_0.

You can check the readme for a list of GGUF models that have been tested and should work.