Qwen1.5-7B-Chat CUDA error: out of memory
yinochaos opened this issue · comments
超 commented
机器配置:A800 80G,机器内存360G
配置文件:
model:
type: inf-llm
path: Qwen/Qwen1.5-7B-Chat
block_size: 128
n_init: 128
n_local: 4096
topk: 16
repr_topk: 4
max_cached_block: 32
exc_block_size: 512
score_decay: 0.1
fattn: true
base: 1000000
distance_scale: 1.0
max_len: 2147483647
chunk_size: 512
conv_type: qwen
修改pred进行推理,输入token长度大约为28W左右【token长度在19W以内是不会报错的】
报错信息
Traceback (most recent call last):
File "/root/data/user/XXXX/git/InfLLM/benchmark/common_pred.py", line 325, in <module>
preds = get_pred(
File "/root/data/user/XXXX/git/InfLLM/benchmark/common_pred.py", line 271, in get_pred
output = searcher.generate(
File "/root/data/user/XXXX/git/InfLLM/inf_llm/utils/greedy_search.py", line 32, in generate
result = self._decode(input_ids, **kwargs)
File "/root/data/user/XXXX/git/InfLLM/inf_llm/utils/greedy_search.py", line 54, in _decode
out = self.model(
File "/root/data/shared/group/common_tools/mambaforge/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/data/shared/group/common_tools/mambaforge/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/root/data/shared/group/common_tools/mambaforge/envs/infllm/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 1173, in forward
outputs = self.model(
File "/root/data/shared/group/common_tools/mambaforge/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/data/shared/group/common_tools/mambaforge/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/root/data/user/XXXX/git/InfLLM/inf_llm/utils/patch.py", line 100, in model_forward
layer_outputs = decoder_layer(
File "/root/data/shared/group/common_tools/mambaforge/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/data/shared/group/common_tools/mambaforge/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/root/data/shared/group/common_tools/mambaforge/envs/infllm/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 773, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/root/data/shared/group/common_tools/mambaforge/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/data/shared/group/common_tools/mambaforge/envs/infllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/root/data/user/XXXX/git/InfLLM/inf_llm/utils/patch.py", line 16, in hf_forward
ret = forward(
File "/root/data/user/XXXX/git/InfLLM/inf_llm/attention/inf_llm.py", line 58, in forward
o = past_key_value.append(
File "/root/data/user/XXXX/git/InfLLM/inf_llm/attention/context_manager.py", line 725, in append
self.append_global(ed - st, kv_ed - kv_st, local_score)
File "/root/data/user/XXXX/git/InfLLM/inf_llm/attention/context_manager.py", line 620, in append_global
MemoryUnit(self.global_remainder[0][u, :, global_remainder_st:global_remainder_st + self.block_size, :],
File "/root/data/user/XXXX/git/InfLLM/inf_llm/attention/context_manager.py", line 34, in __init__
cpu_data = data.contiguous().to("cpu", non_blocking=True).pin_memory()
RuntimeError: CUDA error: out of memory
请问如何解决这个问题,我看显存最大也就是30+G的占用,是哪里出的问题呢?
Pengle Zhang commented
你好,可能是 pin memory 的问题,把 Memory Unit 的 pin memory 去掉试一试
超 commented
你好,可能是 pin memory 的问题,把 Memory Unit 的 pin memory 去掉试一试
你好我把pin memory去掉之后还是一样的报错:
cpu_data = data.contiguous().to("cpu", non_blocking=True)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
其他一些相关环境信息:
Python 3.10.14
Driver Version: 470.161.03 CUDA Version: 12.1
torch: 2.2.2+cu121
transformers:4.39.2
Pengle Zhang commented
抱歉我们目前没有相同的测试环境,不能复现你的问题
或许可以试一试 CUDA 11.8 的 torch
超 commented
抱歉我们目前没有相同的测试环境,不能复现你的问题 或许可以试一试 CUDA 11.8 的 torch
好的,感谢
ehuaa commented
@yinochaos 您好,想问这个问题最后怎么解决的呢?