huggingface / optimum-intel

Error message

The cos_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forwa
rd method of RoPE from now on instead. It is not used in the `LlamaAttention` class
The sin_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forwa
rd method of RoPE from now on instead. It is not used in the `LlamaAttention` class
/home/intel/anaconda3/envs/speculative/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py:1
068: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't reco
rd the data flow of Python values, so this value will be treated as a constant in the future. This means that th
e trace might not generalize to other inputs!
  if seq_length > self.causal_mask.shape[-1]:
terminate called after throwing an instance of 'c10::Error'
  what():  [enforce fail at inline_container.cc:595] . unexpected pos 22847400768 vs 22847400664
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x55 (0x7fe2
5cd4d6b5 in /home/intel/anaconda3/envs/speculative/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x41f849f (0x7fe2489cf49f in /home/intel/anaconda3/envs/speculative/lib/python3.9
/site-packages/torch/lib/libtorch_cpu.so)
frame #2: mz_zip_writer_add_mem_ex_v2 + 0x5c5 (0x7fe2489c9be5 in /home/intel/anaconda3/envs/speculative/lib/pyth
on3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #3: caffe2::serialize::PyTorchStreamWriter::writeRecord(std::string const&, void const*, unsigned long, bo
ol) + 0xdf (0x7fe2489d465f in /home/intel/anaconda3/envs/speculative/lib/python3.9/site-packages/torch/lib/libto
rch_cpu.so)
frame #4: caffe2::serialize::PyTorchStreamWriter::writeEndOfFile() + 0x923 (0x7fe2489d5783 in /home/intel/anacon
da3/envs/speculative/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #5: caffe2::serialize::PyTorchStreamWriter::~PyTorchStreamWriter() + 0x13d (0x7fe2489d5c6d in /home/intel/
anaconda3/envs/speculative/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #6: <unknown function> + 0x1663801 (0x7fe245e3a801 in /home/intel/anaconda3/envs/speculative/lib/python3.9
/site-packages/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0xa33360 (0x7fe25c336360 in /home/intel/anaconda3/envs/speculative/lib/python3.9/
site-packages/torch/lib/libtorch_python.so)
frame #8: <unknown function> + 0x41bcbf (0x7fe25bd1ecbf in /home/intel/anaconda3/envs/speculative/lib/python3.9/
site-packages/torch/lib/libtorch_python.so)
frame #9: python() [0x507387]
<omitting python frames>
frame #11: python() [0x5052a0]
frame #14: python() [0x4e6b2a]
frame #15: python() [0x50508d]
frame #17: python() [0x4e6b2a]
frame #20: python() [0x4e6b2a]
frame #21: python() [0x50508d]
frame #24: python() [0x4e6b2a]
frame #25: python() [0x50508d]
frame #27: python() [0x4e6b2a]
frame #31: python() [0x5c1dc7]
frame #32: python() [0x5bddd0]
frame #33: python() [0x45674e]
frame #37: <unknown function> + 0x29d90 (0x7fe25d926d90 in /lib/x86_64-linux-gnu/libc.so.6)
frame #38: __libc_start_main + 0x80 (0x7fe25d926e40 in /lib/x86_64-linux-gnu/libc.so.6)
frame #39: python() [0x5885ce]

5.15.0-101-generic
optimum-intel 1.16.0.dev0+d2f9fdb
python 3.9
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

Install command

python -m pip install "optimum-intel[extras]"@git+https://github.com/huggingface/optimum-intel.git

Package                       Version
----------------------------- -------------------
absl-py                       2.0.0
accelerate                    0.21.0
aiohttp                       3.9.1
aiosignal                     1.3.1
antlr4-python3-runtime        4.9.3
async-timeout                 4.0.3
attrs                         23.1.0
certifi                       2023.7.22
charset-normalizer            3.3.2
click                         8.1.7
colorama                      0.4.6
coloredlogs                   15.0.1
cpuid-native                  0.0.8
datasets                      2.16.0
dill                          0.3.7
einops                        0.7.0
exceptiongroup                1.1.3
filelock                      3.13.1
frozenlist                    1.4.1
fsspec                        2023.10.0
huggingface-hub               0.21.4
humanfriendly                 10.0
idna                          3.4
iniconfig                     2.0.0
intel-extension-for-pytorch   2.1.0+cpu
intel-openmp                  2024.0.1
Jinja2                        3.1.2
joblib                        1.3.2
MarkupSafe                    2.1.3
mpmath                        1.3.0
multidict                     6.0.4
multiprocess                  0.70.15
networkx                      3.0
nltk                          3.8.1
numpy                         1.26.1
omegaconf                     2.3.0
onnx                          1.15.0
optimum                       1.18.0.dev0
optimum-intel                 1.16.0.dev0+086fae3
packaging                     23.2
pandas                        2.1.3
pip                           23.3
pluggy                        1.3.0
protobuf                      4.25.1
psutil                        5.9.6
py-cpuinfo                    9.0.0
pyarrow                       14.0.2
pyarrow-hotfix                0.6
pytest                        7.4.3
python-dateutil               2.8.2
pytz                          2023.3.post1
PyYAML                        6.0.1
regex                         2023.10.3
requests                      2.31.0
rouge-score                   0.1.2
safetensors                   0.4.2
scipy                         1.12.0
sentencepiece                 0.1.99
setuptools                    68.0.0
six                           1.16.0
sympy                         1.12
tabulate                      0.9.0
tiktoken                      0.5.1
tokenizers                    0.15.2
tomli                         2.0.1
torch                         2.1.0+cpu
tqdm                          4.66.1
transformers                  4.38.2
transformers-stream-generator 0.0.4
typing_extensions             4.8.0
tzdata                        2023.3
urllib3                       2.0.7
wheel                         0.41.2
xxhash                        3.4.1
yarl                          1.9.4

Code based on

optimum-intel/tests/ipex/test_modeling.py

Line 265 in d2f9fdb

tokenizer = AutoTokenizer.from_pretrained(model_id)

import torch
import time
import argparse
from optimum.intel import IPEXModelForCausalLM

import intel_extension_for_pytorch as ipex
from transformers import LlamaTokenizer, AutoModelForCausalLM

# you could tune the prompt based on your own model,
# here the prompt tuning refers to https://huggingface.co/georgesung/llama2_7b_chat_uncensored#prompt-style
LLAMA2_PROMPT_FORMAT = """### HUMAN:
{prompt}

### RESPONSE:
"""

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for Llama2 model')
    parser.add_argument('--repo-id-or-model-path', type=str, default="/home/intel/Nvm/models/Llama-2-7b-chat-hf",
                        help='The huggingface repo id for the Llama2 (e.g. `meta-llama/Llama-2-7b-chat-hf` and `meta-llama/Llama-2-13b-chat-hf`) to be downloaded'
                             ', or the path to the huggingface checkpoint folder')
    parser.add_argument('--assistant-model-path', type=str, default=None,
                        help='The huggingface repo id for the assistant model (e.g. `meta-llama/Llama-2-7b-chat-hf` and `meta-llama/Llama-2-13b-chat-hf`) to be downloaded'
                             ', or the path to the huggingface checkpoint folder')
    parser.add_argument('--prompt', type=str, default="What is AI?",
                        help='Prompt to infer')
    parser.add_argument('--n-predict', type=int, default=32,
                        help='Max tokens to predict')

    args = parser.parse_args()
    model_path = args.repo_id_or_model_path
    assistant_model_path = args.assistant_model_path
    if assistant_model_path is None:
        assistant_model_path = model_path

    assistant_model = AutoModelForCausalLM.from_pretrained(assistant_model_path, trust_remote_code=True)
    
    print("Assistant model loaded!")
    model = IPEXModelForCausalLM.from_pretrained(model_path,
                                                export=True)
    
    print("Main model loaded!")

    # Load tokenizer
    tokenizer = LlamaTokenizer.from_pretrained(model_path, trust_remote_code=True)

    # Generate predicted tokens
    with torch.inference_mode():
        prompt = LLAMA2_PROMPT_FORMAT.format(prompt=args.prompt)
        input_ids = tokenizer.encode(prompt, return_tensors="pt")
        st = time.time()
        # if your selected model is capable of utilizing previous key/value attentions
        # to enhance decoding speed, but has `"use_cache": false` in its model config,
        # it is important to set `use_cache=True` explicitly in the `generate` function
        output = model.generate(input_ids,
                                max_new_tokens=args.n_predict,
                                assistant_model=assistant_model)
        end = time.time()
        output_str = tokenizer.decode(output[0], skip_special_tokens=True)
        print(f'Inference time: {end-st} s')
        print('-'*20, 'Prompt', '-'*20)
        print(prompt)
        print('-'*20, 'Output', '-'*20)
        print(output_str)

Hi @qiyuangong, The error your reporting doesn't seem related to optimum-intel. Could you uninstall and reinstall torch / intel-extension-for-pytorch given the following instructions ?

Hi @qiyuangong, The error your reporting doesn't seem related to optimum-intel. Could you uninstall and reinstall torch / intel-extension-for-pytorch given the following instructions ?

Thank you @echarlaix ! :)
Yes. This error seems raised by intel-extension-for-pytorch. Will report to them.

Issue closed.

Core dump on Ubuntu Ubuntu 22.04.3 using optimum-intel 1.16.0.dev0+d2f9fdb