InternLM / InternLM

Official release of InternLM2 7B and 20B base and chat models. 200K context support

Home Page:https://internlm.intern-ai.org.cn/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[QA] transformers>= 4.34 高版本执行报错CUDA error: device-side assert triggered

zhulinJulia24 opened this issue · comments

Describe the bug

使用demo的脚本,模型是internlm2-chat-20b。 internlm2-chat-7b是好的

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import fire

def main(model_path: str):
    print ("start")
    tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
    # `torch_dtype=torch.float16` 可以令模型以 float16 精度加载,否则 transformers 会将模型加载为 float32,导致显存不足
    model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, trust_remote_code=True).cuda()
    model = model.eval()
    print ("end")

    for i in range(100):
        response, history = model.chat(tokenizer, "你好", history=[])
        print("single response:",response)
        # 你好!有什么我可以帮助你的吗?
        response, history = model.chat(tokenizer, "请提供三个管理时间的建议。", history=history)
        print("advance response:",response)


if __name__ == '__main__':
    fire.Fire(main)

执行报错


Traceback (most recent call last):
  File "/home/zhulin1/lmdeploy/autotest/transformers_func_regression_simple.py", line 32, in <module>
    fire.Fire(main)
  File "/home/zhulin1/miniconda3/envs/lmdeployv2/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/zhulin1/miniconda3/envs/lmdeployv2/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/zhulin1/miniconda3/envs/lmdeployv2/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/zhulin1/lmdeploy/autotest/transformers_func_regression_simple.py", line 16, in main
    response, history = model.chat(tokenizer, "你好", history=[])
  File "/home/zhulin1/miniconda3/envs/lmdeployv2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/zhulin1/.cache/huggingface/modules/transformers_modules/internlm2-chat-7b/modeling_internlm2.py", line 1168, in chat
    outputs = self.generate(
  File "/home/zhulin1/miniconda3/envs/lmdeployv2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/zhulin1/miniconda3/envs/lmdeployv2/lib/python3.10/site-packages/transformers/generation/utils.py", line 1652, in generate
    return self.sample(
  File "/home/zhulin1/miniconda3/envs/lmdeployv2/lib/python3.10/site-packages/transformers/generation/utils.py", line 2734, in sample
    outputs = self(
  File "/home/zhulin1/miniconda3/envs/lmdeployv2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/zhulin1/miniconda3/envs/lmdeployv2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/zhulin1/.cache/huggingface/modules/transformers_modules/internlm2-chat-7b/modeling_internlm2.py", line 1047, in forward
    outputs = self.model(
  File "/home/zhulin1/miniconda3/envs/lmdeployv2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/zhulin1/miniconda3/envs/lmdeployv2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/zhulin1/.cache/huggingface/modules/transformers_modules/internlm2-chat-7b/modeling_internlm2.py", line 890, in forward
    attention_mask = self._prepare_decoder_attention_mask(
  File "/home/zhulin1/.cache/huggingface/modules/transformers_modules/internlm2-chat-7b/modeling_internlm2.py", line 814, in _prepare_decoder_attention_mask
    combined_attention_mask = _make_causal_mask(
  File "/home/zhulin1/.cache/huggingface/modules/transformers_modules/internlm2-chat-7b/modeling_internlm2.py", line 88, in _make_causal_mask
    mask = torch.full((tgt_len, tgt_len), torch.tensor(torch.finfo(dtype).min, device=device), device=device)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Environment

transformers, 4.34.0, 4.34.1和4.37.1 执行报错,4.33.0是好的

cuda和torch信息:
cuda: 11.8
torch: 2.1.0
tokenizers: 0.14.1
A100 80G

Other information

No response

可能是爆显存了,考虑用两张卡试一下?

可能是爆显存了,考虑用两张卡试一下?

我加了device_map="auto" 使用了8卡,还是同样的报错

用了较新的conda环境,也能复现,环境信息:

packages in environment at /home/zhulin1/miniconda3/envs/testpy38:

Name Version Build Channel

_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
accelerate 0.26.1 pypi_0 pypi
ca-certificates 2023.12.12 h06a4308_0
certifi 2022.12.7 pypi_0 pypi
charset-normalizer 2.1.1 pypi_0 pypi
cmake 3.25.0 pypi_0 pypi
einops 0.7.0 pypi_0 pypi
filelock 3.9.0 pypi_0 pypi
fsspec 2023.12.2 pypi_0 pypi
huggingface-hub 0.17.3 pypi_0 pypi
idna 3.4 pypi_0 pypi
jinja2 3.1.2 pypi_0 pypi
libedit 3.1.20230828 h5eee18b_0
libffi 3.2.1 hf484d3e_1007
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
lit 15.0.7 pypi_0 pypi
markupsafe 2.1.3 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
ncurses 6.4 h6a678d5_0
networkx 3.0 pypi_0 pypi
numpy 1.24.1 pypi_0 pypi
openssl 1.1.1w h7f8727e_0
packaging 23.2 pypi_0 pypi
pillow 9.3.0 pypi_0 pypi
pip 23.3.1 py38h06a4308_0
psutil 5.9.8 pypi_0 pypi
python 3.8.0 h0371630_2
pyyaml 6.0.1 pypi_0 pypi
readline 7.0 h7b6447c_5
regex 2023.12.25 pypi_0 pypi
requests 2.28.1 pypi_0 pypi
safetensors 0.4.2 pypi_0 pypi
sentencepiece 0.1.99 pypi_0 pypi
setuptools 68.2.2 py38h06a4308_0
sqlite 3.33.0 h62c20be_0
sympy 1.12 pypi_0 pypi
tk 8.6.12 h1ccaba5_0
tokenizers 0.14.1 pypi_0 pypi
torch 2.0.1+cu118 pypi_0 pypi
torchaudio 2.0.2+cu118 pypi_0 pypi
torchvision 0.15.2+cu118 pypi_0 pypi
tqdm 4.66.1 pypi_0 pypi
transformers 4.34.0 pypi_0 pypi
triton 2.0.0 pypi_0 pypi
typing-extensions 4.4.0 pypi_0 pypi
urllib3 1.26.13 pypi_0 pypi
wheel 0.41.2 py38h06a4308_0
xz 5.4.5 h5eee18b_0
zlib 1.2.13 h5eee18b_0

模型文件内有旧模型时生成的tokenizer.json,导致出现问题;tokenizer.json非此模型文件必须的,删除后可以恢复