armbues / SiLLM

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[load_gguf] gguf_tensor_to_f16 failed

DenisSergeevitch opened this issue · comments

I'm trying to launch already quantized models that are working with llama.cpp, but it does not work with SiLLM.

Am I missing something, and SiLLM only works with FP16 models?

Models tried:

WizardLM-2-8x22B.Q2_K-00001-of-00005.gguf
ggml-c4ai-command-r-plus-104b-iq2_m.gguf
phi-2-orange-v2.Q8_0.gguf

GGUF support in SiLLM via MLX is currently limited to quantizations Q4_0, Q4_1 and Q8_0.

You can check the readme for a list of GGUF models that have been tested and should work.