intel / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.

Repository from Github https://github.comintel/ipex-llmRepository from Github https://github.comintel/ipex-llm

Run Qwen GGUF by ipex-llm transformers python

KiwiHana opened this issue · comments

Need to support Deepseek-R1:32B and Qwen2.5-1.5B, Qwen2.5-3B GGUF by ipex-llm transformers

GGUF Q4_0, Q4_1 or Q4_K_M is OK. Support Q4_K_M is better.

C:\Users\Lengda\Documents\spec-decode-0325>kai-xpu-0324\python.exe gguf_speculative_decoding.py
model_family:qwen2
2025-03-31 13:56:47,437 - ERROR -
 
****************************Usage Error************************
Unsupported model family: qwen2
2025-03-31 13:56:47,437 - ERROR -
 
****************************Call Stack*************************
Traceback (most recent call last):
  File "C:\Users\Lengda\Documents\spec-decode-0325\gguf_speculative_decoding.py", line 34, in <module>
    model, tokenizer = AutoModelForCausalLM.from_gguf(checkpoint)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Lengda\Documents\spec-decode-0325\kai-xpu-0324\Lib\site-packages\ipex_llm\transformers\model.py", line 405, in from_gguf
    model, tokenizer = load_gguf_model(fpath, dtype=torch.half, low_bit=low_bit)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Lengda\Documents\spec-decode-0325\kai-xpu-0324\Lib\site-packages\ipex_llm\transformers\gguf\api.py", line 73, in load_gguf_model
    invalidInputError(False, f"Unsupported model family: {model_family}")
  File "C:\Users\Lengda\Documents\spec-decode-0325\kai-xpu-0324\Lib\site-packages\ipex_llm\utils\common\log4Error.py", line 32, in invalidInputError
    raise RuntimeError(errMsg)
RuntimeError: Unsupported model family: qwen2

Need to support Deepseek-R1:32B and Qwen2.5-1.5B, Qwen2.5-3B GGUF by ipex-llm transformers

GGUF Q4_0, Q4_1 or Q4_K_M is OK. Support Q4_K_M is better.

C:\Users\Lengda\Documents\spec-decode-0325>kai-xpu-0324\python.exe gguf_speculative_decoding.py
model_family:qwen2
2025-03-31 13:56:47,437 - ERROR -
 
****************************Usage Error************************
Unsupported model family: qwen2
2025-03-31 13:56:47,437 - ERROR -
 
****************************Call Stack*************************
Traceback (most recent call last):
  File "C:\Users\Lengda\Documents\spec-decode-0325\gguf_speculative_decoding.py", line 34, in <module>
    model, tokenizer = AutoModelForCausalLM.from_gguf(checkpoint)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Lengda\Documents\spec-decode-0325\kai-xpu-0324\Lib\site-packages\ipex_llm\transformers\model.py", line 405, in from_gguf
    model, tokenizer = load_gguf_model(fpath, dtype=torch.half, low_bit=low_bit)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Lengda\Documents\spec-decode-0325\kai-xpu-0324\Lib\site-packages\ipex_llm\transformers\gguf\api.py", line 73, in load_gguf_model
    invalidInputError(False, f"Unsupported model family: {model_family}")
  File "C:\Users\Lengda\Documents\spec-decode-0325\kai-xpu-0324\Lib\site-packages\ipex_llm\utils\common\log4Error.py", line 32, in invalidInputError
    raise RuntimeError(errMsg)
RuntimeError: Unsupported model family: qwen2

Dude any idea to run qwen3 gguf model?

Need to support Deepseek-R1:32B and Qwen2.5-1.5B, Qwen2.5-3B GGUF by ipex-llm transformers
GGUF Q4_0, Q4_1 or Q4_K_M is OK. Support Q4_K_M is better.

C:\Users\Lengda\Documents\spec-decode-0325>kai-xpu-0324\python.exe gguf_speculative_decoding.py
model_family:qwen2
2025-03-31 13:56:47,437 - ERROR -
 
****************************Usage Error************************
Unsupported model family: qwen2
2025-03-31 13:56:47,437 - ERROR -
 
****************************Call Stack*************************
Traceback (most recent call last):
  File "C:\Users\Lengda\Documents\spec-decode-0325\gguf_speculative_decoding.py", line 34, in <module>
    model, tokenizer = AutoModelForCausalLM.from_gguf(checkpoint)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Lengda\Documents\spec-decode-0325\kai-xpu-0324\Lib\site-packages\ipex_llm\transformers\model.py", line 405, in from_gguf
    model, tokenizer = load_gguf_model(fpath, dtype=torch.half, low_bit=low_bit)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Lengda\Documents\spec-decode-0325\kai-xpu-0324\Lib\site-packages\ipex_llm\transformers\gguf\api.py", line 73, in load_gguf_model
    invalidInputError(False, f"Unsupported model family: {model_family}")
  File "C:\Users\Lengda\Documents\spec-decode-0325\kai-xpu-0324\Lib\site-packages\ipex_llm\utils\common\log4Error.py", line 32, in invalidInputError
    raise RuntimeError(errMsg)
RuntimeError: Unsupported model family: qwen2

Dude any idea to run qwen3 gguf model?

you can use ollama-intel version: https://www.modelscope.cn/models/Intel/ollama