[QA] transformers>= 4.34 高版本执行报错CUDA error: device-side assert triggered

Question

[QA] transformers>= 4.34 高版本执行报错CUDA error: device-side assert triggered

zhulinJulia24 opened this issue 4 months ago · comments

Describe the bug

使用demo的脚本，模型是internlm2-chat-20b。 internlm2-chat-7b是好的

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import fire

def main(model_path: str):
    print ("start")
    tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
    # `torch_dtype=torch.float16` 可以令模型以 float16 精度加载，否则 transformers 会将模型加载为 float32，导致显存不足
    model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, trust_remote_code=True).cuda()
    model = model.eval()
    print ("end")

    for i in range(100):
        response, history = model.chat(tokenizer, "你好", history=[])
        print("single response:",response)
        # 你好！有什么我可以帮助你的吗？
        response, history = model.chat(tokenizer, "请提供三个管理时间的建议。", history=history)
        print("advance response:",response)


if __name__ == '__main__':
    fire.Fire(main)

执行报错


Traceback (most recent call last):
  File "/home/zhulin1/lmdeploy/autotest/transformers_func_regression_simple.py", line 32, in <module>
    fire.Fire(main)
  File "/home/zhulin1/miniconda3/envs/lmdeployv2/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/zhulin1/miniconda3/envs/lmdeployv2/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/zhulin1/miniconda3/envs/lmdeployv2/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/zhulin1/lmdeploy/autotest/transformers_func_regression_simple.py", line 16, in main
    response, history = model.chat(tokenizer, "你好", history=[])
  File "/home/zhulin1/miniconda3/envs/lmdeployv2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/zhulin1/.cache/huggingface/modules/transformers_modules/internlm2-chat-7b/modeling_internlm2.py", line 1168, in chat
    outputs = self.generate(
  File "/home/zhulin1/miniconda3/envs/lmdeployv2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/zhulin1/miniconda3/envs/lmdeployv2/lib/python3.10/site-packages/transformers/generation/utils.py", line 1652, in generate
    return self.sample(
  File "/home/zhulin1/miniconda3/envs/lmdeployv2/lib/python3.10/site-packages/transformers/generation/utils.py", line 2734, in sample
    outputs = self(
  File "/home/zhulin1/miniconda3/envs/lmdeployv2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/zhulin1/miniconda3/envs/lmdeployv2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/zhulin1/.cache/huggingface/modules/transformers_modules/internlm2-chat-7b/modeling_internlm2.py", line 1047, in forward
    outputs = self.model(
  File "/home/zhulin1/miniconda3/envs/lmdeployv2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/zhulin1/miniconda3/envs/lmdeployv2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/zhulin1/.cache/huggingface/modules/transformers_modules/internlm2-chat-7b/modeling_internlm2.py", line 890, in forward
    attention_mask = self._prepare_decoder_attention_mask(
  File "/home/zhulin1/.cache/huggingface/modules/transformers_modules/internlm2-chat-7b/modeling_internlm2.py", line 814, in _prepare_decoder_attention_mask
    combined_attention_mask = _make_causal_mask(
  File "/home/zhulin1/.cache/huggingface/modules/transformers_modules/internlm2-chat-7b/modeling_internlm2.py", line 88, in _make_causal_mask
    mask = torch.full((tgt_len, tgt_len), torch.tensor(torch.finfo(dtype).min, device=device), device=device)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Environment

transformers, 4.34.0, 4.34.1和4.37.1 执行报错，4.33.0是好的

cuda和torch信息：
cuda: 11.8
torch: 2.1.0
tokenizers: 0.14.1
A100 80G

Other information

No response

Wenwei Zhang · Answer 1 · Thu Jan 25 2024 19:46:10 GMT+0800 (China Standard Time)

可能是爆显存了，考虑用两张卡试一下？

zhulinJulia24 · Answer 2 · Thu Jan 25 2024 20:22:30 GMT+0800 (China Standard Time)

可能是爆显存了，考虑用两张卡试一下？

我加了device_map="auto" 使用了8卡，还是同样的报错

zhulinJulia24 · Answer 3 · Thu Jan 25 2024 20:43:24 GMT+0800 (China Standard Time)

用了较新的conda环境，也能复现，环境信息：

packages in environment at /home/zhulin1/miniconda3/envs/testpy38:

zhulinJulia24 · Answer 4 · Fri Jan 26 2024 17:05:26 GMT+0800 (China Standard Time)

模型文件内有旧模型时生成的tokenizer.json，导致出现问题；tokenizer.json非此模型文件必须的，删除后可以恢复

InternLM / InternLM

[QA] transformers>= 4.34 高版本执行报错CUDA error: device-side assert triggered

Describe the bug

Environment

Other information

packages in environment at /home/zhulin1/miniconda3/envs/testpy38:

Name Version Build Channel