intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

miniCPM run benchmark get error in iGPU

violet17 opened this issue · comments

log:

python run.py
C:\Users\mi\miniconda3\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
2024-05-21 20:54:40,883 - INFO - intel_extension_for_pytorch auto imported
C:\Users\mi\miniconda3\lib\site-packages\transformers\deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
C:\Users\mi\miniconda3\lib\site-packages\torch\_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
2024-05-21 20:54:46,922 - INFO - Converting the current model to sym_int4 format......
>> loading of model costs 19.866792400000122s and 2.544921875GB
<class 'transformers_modules.MiniCPM-2B-dpo-bf16.modeling_minicpm.MiniCPMForCausalLM'>
C:\Users\mi\miniconda3\lib\site-packages\transformers\generation\configuration_utils.py:515: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.8` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
  warnings.warn(
C:\Users\mi\miniconda3\lib\site-packages\transformers\generation\configuration_utils.py:520: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.8` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
  warnings.warn(
2024-05-21 20:55:01,157 - WARNING - The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
2024-05-21 20:55:01,157 - WARNING - Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Assertion failed: nb % SBS == 0, file dequantize.cpp, line 23

version:

ipex-llm                                2.1.0b20240519
intel-extension-for-pytorch             2.1.20+git4849f3b
pytorch-lightning                       2.2.2
pytorch-wpe                             0.0.1
rotary-embedding-torch                  0.5.3
torch                                   2.1.0a0+git7bcf7da
torch-complex                           0.4.3
torchaudio                              2.1.0+6ea1133
torchmetrics                            1.3.2
torchvision                             0.16.0a0+cxx11.abi

model:

MiniCPM-2B-dpo-fp32
MiniCPM-2B-dpo-bf16

Solved in 20240523 nightly build. Thanks