[load_gguf] gguf_tensor_to_f16 failed
DenisSergeevitch opened this issue · comments
I'm trying to launch already quantized models that are working with llama.cpp, but it does not work with SiLLM.
Am I missing something, and SiLLM only works with FP16 models?
Models tried:
WizardLM-2-8x22B.Q2_K-00001-of-00005.gguf
ggml-c4ai-command-r-plus-104b-iq2_m.gguf
phi-2-orange-v2.Q8_0.gguf
GGUF support in SiLLM via MLX is currently limited to quantizations Q4_0, Q4_1 and Q8_0.
You can check the readme for a list of GGUF models that have been tested and should work.