Qwen1.5-4b and Qwen1.5-7b model cannot be loaded correctly in ipex-llm version 20240522

Question

Qwen1.5-4b and Qwen1.5-7b model cannot be loaded correctly in ipex-llm version 20240522

grandxin opened this issue a month ago · comments

I save qwen1.5-4b and 7b int4 model in my computer, when loaded these models, there are some errors:

Some weights of the model checkpoint at ./models/qwen1.5-4b were not used when initializing Qwen2ForCausalLM: ['model.layers.0.self_attn.k_proj.bias', 'model.layers.0.self_attn.k_proj.weight', 'model.layers.0.self_attn.q_proj.bias', 'model.layers.0.self_attn.q_proj.weight', 'model.layers.0.self_attn.v_proj.bias', 'model.layers.0.self_attn.v_proj.weight', 'model.layers.1.self_attn.k_proj.bias', 'model.layers.1.self_attn.k_proj.weight', 'model.layers.1.self_attn.q_proj.bias', 'model.layers.1.self_attn.q_proj.weight', 'model.layers.1.self_attn.v_proj.bias', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.10.self_attn.k_proj.bias', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.10.self_attn.q_proj.bias', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.10.self_attn.v_proj.bias', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.11.self_attn.k_proj.bias', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.11.self_attn.q_proj.bias', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.11.self_attn.v_proj.bias', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.12.self_attn.k_proj.bias', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.12.self_attn.q_proj.bias', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.12.self_attn.v_proj.bias', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.13.self_attn.k_proj.bias', 'model.layers.13.self_attn.k_proj.weight', 'model.layers.13.self_attn.q_proj.bias', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.13.self_attn.v_proj.bias', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.14.self_attn.k_proj.bias', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.14.self_attn.q_proj.bias', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.14.self_attn.v_proj.bias', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.15.self_attn.k_proj.bias', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.15.self_attn.q_proj.bias', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.15.self_attn.v_proj.bias', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.16.self_attn.k_proj.bias', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.16.self_attn.q_proj.bias', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.16.self_attn.v_proj.bias', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.17.self_attn.k_proj.bias', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.17.self_attn.q_proj.bias', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.17.self_attn.v_proj.bias', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.18.self_attn.k_proj.bias', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.18.self_attn.q_proj.bias', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.18.self_attn.v_proj.bias', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.19.self_attn.k_proj.bias', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.19.self_attn.q_proj.bias', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.19.self_attn.v_proj.bias', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.2.self_attn.k_proj.bias', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.2.self_attn.q_proj.bias', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.2.self_attn.v_proj.bias', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.20.self_attn.k_proj.bias', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.20.self_attn.q_proj.bias', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.20.self_attn.v_proj.bias', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.21.self_attn.k_proj.bias', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.21.self_attn.q_proj.bias', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.21.self_attn.v_proj.bias', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.22.self_attn.k_proj.bias', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.22.self_attn.q_proj.bias', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.22.self_attn.v_proj.bias', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.23.self_attn.k_proj.bias', 'model.layers.23.self_attn.k_proj.weight', 'model.layers.23.self_attn.q_proj.bias', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.23.self_attn.v_proj.bias', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.24.self_attn.k_proj.bias', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.24.self_attn.q_proj.bias', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.24.self_attn.v_proj.bias', 'model.layers.24.self_attn.v_proj.weight', 'model.layers.25.self_attn.k_proj.bias', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.25.self_attn.q_proj.bias', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.25.self_attn.v_proj.bias', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.26.self_attn.k_proj.bias', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.26.self_attn.q_proj.bias', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.26.self_attn.v_proj.bias', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.27.self_attn.k_proj.bias', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.27.self_attn.q_proj.bias', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.27.self_attn.v_proj.bias', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.28.self_attn.k_proj.bias', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.28.self_attn.q_proj.bias', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.28.self_attn.v_proj.bias', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.29.self_attn.k_proj.bias', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.29.self_attn.q_proj.bias', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.29.self_attn.v_proj.bias', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.3.self_attn.k_proj.bias', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.3.self_attn.q_proj.bias', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.3.self_attn.v_proj.bias', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.30.self_attn.k_proj.bias', 'model.layers.30.self_attn.k_proj.weight', 'model.layers.30.self_attn.q_proj.bias', 'model.layers.30.self_attn.q_proj.weight', 'model.layers.30.self_attn.v_proj.bias', 'model.layers.30.self_attn.v_proj.weight', 'model.layers.31.self_attn.k_proj.bias', 'model.layers.31.self_attn.k_proj.weight', 'model.layers.31.self_attn.q_proj.bias', 'model.layers.31.self_attn.q_proj.weight', 'model.layers.31.self_attn.v_proj.bias', 'model.layers.31.self_attn.v_proj.weight', 'model.layers.32.self_attn.k_proj.bias', 'model.layers.32.self_attn.k_proj.weight', 'model.layers.32.self_attn.q_proj.bias', 'model.layers.32.self_attn.q_proj.weight', 'model.layers.32.self_attn.v_proj.bias', 'model.layers.32.self_attn.v_proj.weight', 'model.layers.33.self_attn.k_proj.bias', 'model.layers.33.self_attn.k_proj.weight', 'model.layers.33.self_attn.q_proj.bias', 'model.layers.33.self_attn.q_proj.weight', 'model.layers.33.self_attn.v_proj.bias', 'model.layers.33.self_attn.v_proj.weight', 'model.layers.34.self_attn.k_proj.bias', 'model.layers.34.self_attn.k_proj.weight', 'model.layers.34.self_attn.q_proj.bias', 'model.layers.34.self_attn.q_proj.weight', 'model.layers.34.self_attn.v_proj.bias', 'model.layers.34.self_attn.v_proj.weight', 'model.layers.35.self_attn.k_proj.bias', 'model.layers.35.self_attn.k_proj.weight', 'model.layers.35.self_attn.q_proj.bias', 'model.layers.35.self_attn.q_proj.weight', 'model.layers.35.self_attn.v_proj.bias', 'model.layers.35.self_attn.v_proj.weight', 'model.layers.36.self_attn.k_proj.bias', 'model.layers.36.self_attn.k_proj.weight', 'model.layers.36.self_attn.q_proj.bias', 'model.layers.36.self_attn.q_proj.weight', 'model.layers.36.self_attn.v_proj.bias', 'model.layers.36.self_attn.v_proj.weight', 'model.layers.37.self_attn.k_proj.bias', 'model.layers.37.self_attn.k_proj.weight', 'model.layers.37.self_attn.q_proj.bias', 'model.layers.37.self_attn.q_proj.weight', 'model.layers.37.self_attn.v_proj.bias', 'model.layers.37.self_attn.v_proj.weight', 'model.layers.38.self_attn.k_proj.bias', 'model.layers.38.self_attn.k_proj.weight', 'model.layers.38.self_attn.q_proj.bias', 'model.layers.38.self_attn.q_proj.weight', 'model.layers.38.self_attn.v_proj.bias', 'model.layers.38.self_attn.v_proj.weight', 'model.layers.39.self_attn.k_proj.bias', 'model.layers.39.self_attn.k_proj.weight', 'model.layers.39.self_attn.q_proj.bias', 'model.layers.39.self_attn.q_proj.weight', 'model.layers.39.self_attn.v_proj.bias', 'model.layers.39.self_attn.v_proj.weight', 'model.layers.4.self_attn.k_proj.bias', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.4.self_attn.q_proj.bias', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.4.self_attn.v_proj.bias', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.5.self_attn.k_proj.bias', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.5.self_attn.q_proj.bias', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.5.self_attn.v_proj.bias', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.6.self_attn.k_proj.bias', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.6.self_attn.q_proj.bias', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.6.self_attn.v_proj.bias', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.7.self_attn.k_proj.bias', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.7.self_attn.q_proj.bias', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.7.self_attn.v_proj.bias', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.8.self_attn.k_proj.bias', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.8.self_attn.q_proj.bias', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.8.self_attn.v_proj.bias', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.9.self_attn.k_proj.bias', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.9.self_attn.q_proj.bias', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.9.self_attn.v_proj.bias', 'model.layers.9.self_attn.v_proj.weight']

This IS expected if you are initializing Qwen2ForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing Qwen2ForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Qwen2ForCausalLM were not initialized from the model checkpoint at ./models/qwen1.5-4b and are newly initialized: ['model.layers.0.self_attn.qkv_proj.bias', 'model.layers.0.self_attn.qkv_proj.weight', 'model.layers.1.self_attn.qkv_proj.bias', 'model.layers.1.self_attn.qkv_proj.weight', 'model.layers.10.self_attn.qkv_proj.bias', 'model.layers.10.self_attn.qkv_proj.weight', 'model.layers.11.self_attn.qkv_proj.bias', 'model.layers.11.self_attn.qkv_proj.weight', 'model.layers.12.self_attn.qkv_proj.bias', 'model.layers.12.self_attn.qkv_proj.weight', 'model.layers.13.self_attn.qkv_proj.bias', 'model.layers.13.self_attn.qkv_proj.weight', 'model.layers.14.self_attn.qkv_proj.bias', 'model.layers.14.self_attn.qkv_proj.weight', 'model.layers.15.self_attn.qkv_proj.bias', 'model.layers.15.self_attn.qkv_proj.weight', 'model.layers.16.self_attn.qkv_proj.bias', 'model.layers.16.self_attn.qkv_proj.weight', 'model.layers.17.self_attn.qkv_proj.bias', 'model.layers.17.self_attn.qkv_proj.weight', 'model.layers.18.self_attn.qkv_proj.bias', 'model.layers.18.self_attn.qkv_proj.weight', 'model.layers.19.self_attn.qkv_proj.bias', 'model.layers.19.self_attn.qkv_proj.weight', 'model.layers.2.self_attn.qkv_proj.bias', 'model.layers.2.self_attn.qkv_proj.weight', 'model.layers.20.self_attn.qkv_proj.bias', 'model.layers.20.self_attn.qkv_proj.weight', 'model.layers.21.self_attn.qkv_proj.bias', 'model.layers.21.self_attn.qkv_proj.weight', 'model.layers.22.self_attn.qkv_proj.bias', 'model.layers.22.self_attn.qkv_proj.weight', 'model.layers.23.self_attn.qkv_proj.bias', 'model.layers.23.self_attn.qkv_proj.weight', 'model.layers.24.self_attn.qkv_proj.bias', 'model.layers.24.self_attn.qkv_proj.weight', 'model.layers.25.self_attn.qkv_proj.bias', 'model.layers.25.self_attn.qkv_proj.weight', 'model.layers.26.self_attn.qkv_proj.bias', 'model.layers.26.self_attn.qkv_proj.weight', 'model.layers.27.self_attn.qkv_proj.bias', 'model.layers.27.self_attn.qkv_proj.weight', 'model.layers.28.self_attn.qkv_proj.bias', 'model.layers.28.self_attn.qkv_proj.weight', 'model.layers.29.self_attn.qkv_proj.bias', 'model.layers.29.self_attn.qkv_proj.weight', 'model.layers.3.self_attn.qkv_proj.bias', 'model.layers.3.self_attn.qkv_proj.weight', 'model.layers.30.self_attn.qkv_proj.bias', 'model.layers.30.self_attn.qkv_proj.weight', 'model.layers.31.self_attn.qkv_proj.bias', 'model.layers.31.self_attn.qkv_proj.weight', 'model.layers.32.self_attn.qkv_proj.bias', 'model.layers.32.self_attn.qkv_proj.weight', 'model.layers.33.self_attn.qkv_proj.bias', 'model.layers.33.self_attn.qkv_proj.weight', 'model.layers.34.self_attn.qkv_proj.bias', 'model.layers.34.self_attn.qkv_proj.weight', 'model.layers.35.self_attn.qkv_proj.bias', 'model.layers.35.self_attn.qkv_proj.weight', 'model.layers.36.self_attn.qkv_proj.bias', 'model.layers.36.self_attn.qkv_proj.weight', 'model.layers.37.self_attn.qkv_proj.bias', 'model.layers.37.self_attn.qkv_proj.weight', 'model.layers.38.self_attn.qkv_proj.bias', 'model.layers.38.self_attn.qkv_proj.weight', 'model.layers.39.self_attn.qkv_proj.bias', 'model.layers.39.self_attn.qkv_proj.weight', 'model.layers.4.self_attn.qkv_proj.bias', 'model.layers.4.self_attn.qkv_proj.weight', 'model.layers.5.self_attn.qkv_proj.bias', 'model.layers.5.self_attn.qkv_proj.weight', 'model.layers.6.self_attn.qkv_proj.bias', 'model.layers.6.self_attn.qkv_proj.weight', 'model.layers.7.self_attn.qkv_proj.bias', 'model.layers.7.self_attn.qkv_proj.weight', 'model.layers.8.self_attn.qkv_proj.bias', 'model.layers.8.self_attn.qkv_proj.weight', 'model.layers.9.self_attn.qkv_proj.bias', 'model.layers.9.self_attn.qkv_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

but when I use ipex-llm version 20240520 or earlier version, all is well

Yishuo Wang · Answer 1 · Fri May 24 2024 09:56:49 GMT+0800 (China Standard Time)

we have made some breaking change on qwen-1.5's int4 checkpoint in 5.21 version, old int4 checkpoint(generated by ipex 0520 or eariler) cannot be loaded with new ipex-llm(0521 or later), please regenerate int4 checkpoint with ipex-llm 20240521 or later

grandxin · Answer 2 · Fri May 24 2024 10:20:40 GMT+0800 (China Standard Time)

we have made some breaking change on qwen-1.5's int4 checkpoint in 5.21 version, old int4 checkpoint(generated by ipex 0520 or eariler) cannot be loaded with new ipex-llm(0521 or later), please regenerate int4 checkpoint with ipex-llm 20240521 or later

ok, got it.
the new version has some improvements? such as quantization accuracy, or RAM?

Yishuo Wang · Answer 3 · Fri May 24 2024 10:31:43 GMT+0800 (China Standard Time)

ok, got it.
the new version has some improvements? such as quantization accuracy, or RAM?

yes, there should be some improvements on speed and RAM, but not much

grandxin · Answer 4 · Fri May 24 2024 11:16:04 GMT+0800 (China Standard Time)

ok, got it.
the new version has some improvements? such as quantization accuracy, or RAM?

yes, there should be some improvements on speed and RAM, but not much

I regenerate qwen-7b int4 model and run it on my laptop(ultra 7 155H), but the "warm up" stage costs very long time(more than 5 minutes), do you have any advice?

Yishuo Wang · Answer 5 · Fri May 24 2024 11:18:56 GMT+0800 (China Standard Time)

I regenerate qwen-7b int4 model and run it on my laptop(ultra 7 155H), but the "warm up" stage costs very long time(more than 5 minutes), do you have any advice?

Did you set SYCL_CACHE_PERSISTENT=1? https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration

grandxin · Answer 6 · Fri May 24 2024 14:14:50 GMT+0800 (China Standard Time)

I regenerate qwen-7b int4 model and run it on my laptop(ultra 7 155H), but the "warm up" stage costs very long time(more than 5 minutes), do you have any advice?

Did you set SYCL_CACHE_PERSISTENT=1? https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration

yes, i have set it
I found that warm up speed is much faster in cpu mode(about 10-20s). but slower in xpu mode..

Yishuo Wang · Answer 7 · Mon May 27 2024 14:32:12 GMT+0800 (China Standard Time)

I found that warm up speed is much faster in cpu mode(about 10-20s). but slower in xpu mode..

CPU doesn't need JIT compilation, while gpu needs.

On CPU: load model -> quantization -> inference

On GPU: load model -> quantization -> JIT compilation -> inference. This JIT compilation is what we called warm up, and it costs about ten minutes.

set SYCL_CACHE_PERSISTENT=1 will store gpu JIT code on disk so that it won't need to compile again the second time you run it.

If you are using powershell, please use CMD instead.

Could you check whether C:\Users\<user name>\AppData\Roaming\libsycl_cache exists ? If exits, please delete it. Then set SYCL_CACHE_PERSISTENT=1 and run inference (this run will take a long time (about 10 minutes) because it needs to regenerate JIT code cache), after finish, you should see regenerated C:\Users\<user name>\AppData\Roaming\libsycl_cache. With cache, following inference should has no warm up. (set SYCL_CACHE_PERSISTENT=1 is still required)

grandxin · Answer 8 · Mon Jun 03 2024 20:39:15 GMT+0800 (China Standard Time)

I found that warm up speed is much faster in cpu mode(about 10-20s). but slower in xpu mode..

CPU doesn't need JIT compilation, while gpu needs.

On CPU: load model -> quantization -> inference

On GPU: load model -> quantization -> JIT compilation -> inference. This JIT compilation is what we called warm up, and it costs about ten minutes.

set SYCL_CACHE_PERSISTENT=1 will store gpu JIT code on disk so that it won't need to compile again the second time you run it.

If you are using powershell, please use CMD instead.

Could you check whether C:\Users\<user name>\AppData\Roaming\libsycl_cache exists ? If exits, please delete it. Then set SYCL_CACHE_PERSISTENT=1 and run inference (this run will take a long time (about 10 minutes) because it needs to regenerate JIT code cache), after finish, you should see regenerated C:\Users\<user name>\AppData\Roaming\libsycl_cache. With cache, following inference should has no warm up. (set SYCL_CACHE_PERSISTENT=1 is still required)

ok，i will try， thank you very much.
If libsycl_cache exists, even if I finish the infer process, restart and reload model, is there no need for a warm up?

Yishuo Wang · Answer 9 · Tue Jun 04 2024 09:47:49 GMT+0800 (China Standard Time)

If libsycl_cache exists, even if I finish the infer process, restart and reload model, is there no need for a warm up?

yes