[Bug] Error when using multiple GPUs

Question

[Bug] Error when using multiple GPUs

HelanHu opened this issue 4 months ago · comments

HelanHu commented 4 months ago

先决条件

我已经搜索过问题和讨论但未得到预期的帮助。
错误在最新版本中尚未被修复。

问题类型

我正在使用官方支持的任务/模型/数据集进行评估。

环境

python -c "import opencompass.utils;import pprint;pprint.pprint(dict(opencompass.utils.collect_env()))"
{'CUDA available': True,
'CUDA_HOME': '/usr/local/cuda',
'GCC': 'gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0',
'GPU 0,1,2,3,4,5,6,7': 'NVIDIA A40',
'MMEngine': '0.10.2',
'NVCC': 'Cuda compilation tools, release 11.8, V11.8.89',
'OpenCV': '4.9.0',
'PyTorch': '2.1.1',
'PyTorch compiling details': 'PyTorch built with:\n'
' - GCC 9.3\n'
' - C++ Version: 201703\n'
' - Intel(R) oneAPI Math Kernel Library Version '
'2023.1-Product Build 20230303 for Intel(R) 64 '
'architecture applications\n'
' - Intel(R) MKL-DNN v3.1.1 (Git Hash '
'64f6bcbcbab628e96f33a62c3e975f8535a7bde4)\n'
' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
' - LAPACK is enabled (usually provided by '
'MKL)\n'
' - NNPACK is enabled\n'
' - CPU capability usage: AVX512\n'
' - CUDA Runtime 11.8\n'
' - NVCC architecture flags: '
'-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_37,code=compute_37\n'
' - CuDNN 8.7\n'
' - Magma 2.6.1\n'
' - Build settings: BLAS_INFO=mkl, '
'BUILD_TYPE=Release, CUDA_VERSION=11.8, '
'CUDNN_VERSION=8.7.0, '
'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
'-fabi-version=11 -fvisibility-inlines-hidden '
'-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
'-DLIBKINETO_NOROCTRACER -DUSE_FBGEMM '
'-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK '
'-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
'-O2 -fPIC -Wall -Wextra -Werror=return-type '
'-Werror=non-virtual-dtor -Werror=bool-operation '
'-Wnarrowing -Wno-missing-field-initializers '
'-Wno-type-limits -Wno-array-bounds '
'-Wno-unknown-pragmas -Wno-unused-parameter '
'-Wno-unused-function -Wno-unused-result '
'-Wno-strict-overflow -Wno-strict-aliasing '
'-Wno-stringop-overflow -Wno-psabi '
'-Wno-error=pedantic -Wno-error=old-style-cast '
'-Wno-invalid-partial-specialization '
'-Wno-unused-private-field '
'-Wno-aligned-allocation-unavailable '
'-Wno-missing-braces -fdiagnostics-color=always '
'-faligned-new -Wno-unused-but-set-variable '
'-Wno-maybe-uninitialized -fno-math-errno '
'-fno-trapping-math -Werror=format '
'-Werror=cast-function-type '
'-Wno-stringop-overflow, LAPACK_INFO=mkl, '
'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
'PERF_WITH_AVX512=1, '
'TORCH_DISABLE_GPU_ASSERTS=ON, '
'TORCH_VERSION=2.1.1, USE_CUDA=ON, USE_CUDNN=ON, '
'USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, '
'USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, '
'USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, '
'USE_OPENMP=ON, USE_ROCM=OFF, \n',
'Python': '3.10.12 (main, Jul 5 2023, 18:54:27) [GCC 11.2.0]',
'TorchVision': '0.16.1',
'numpy_random_seed': 2147483648,
'opencompass': '0.2.1+4f78388',
'sys.platform': 'linux'}

重现问题 - 代码/配置示例

from mmengine.config import read_base
from opencompass.models import HuggingFaceCausalLM

with read_base():
from .datasets.ARC_c.ARC_c_gen import ARC_c_datasets
datasets = [*ARC_c_datasets]
models = [
dict(
type=HuggingFaceCausalLM,
abbr='llama-2-70b-hf',
path="xxx/models/Llama-2-70b-hf",
tokenizer_path='xxx/models/Llama-2-70b-hf',
tokenizer_kwargs=dict(padding_side='left',
truncation_side='left',
use_fast=False,
),
max_out_len=100,
max_seq_len=2048,
batch_size=1,
model_kwargs=dict(device_map='auto'),
batch_padding=False, # if false, inference with for-loop without batch padding
run_cfg=dict(num_gpus=4, num_procs=1),
)
]

重现问题 - 命令或脚本

python run.py /home/hhl/evaluation/opencompass/configs/eval_arc.py --debug

重现问题 - 错误信息

01/30 00:18:27 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
01/30 00:18:27 - OpenCompass - DEBUG - Modules of opencompass's partitioner registry have been automatically imported from opencompass.partitioners
01/30 00:18:27 - OpenCompass - DEBUG - Get class SizePartitioner from "partitioner" registry in "opencompass"
01/30 00:18:27 - OpenCompass - DEBUG - An SizePartitioner instance is built from registry, and its implementation can be found in opencompass.partitioners.size
01/30 00:18:27 - OpenCompass - DEBUG - Key eval.runner.task.judge_cfg not found in config, ignored.
01/30 00:18:27 - OpenCompass - DEBUG - Key eval.runner.task.dump_details not found in config, ignored.
01/30 00:18:27 - OpenCompass - DEBUG - Additional config: {}
01/30 00:18:27 - OpenCompass - INFO - Partitioned into 1 tasks.
01/30 00:18:27 - OpenCompass - DEBUG - Task 0: [llama-2-70b-hf/ARC-c]
01/30 00:18:27 - OpenCompass - DEBUG - Modules of opencompass's runner registry have been automatically imported from opencompass.runners
01/30 00:18:27 - OpenCompass - DEBUG - Get class LocalRunner from "runner" registry in "opencompass"
01/30 00:18:27 - OpenCompass - DEBUG - An LocalRunner instance is built from registry, and its implementation can be found in opencompass.runners.local
01/30 00:18:27 - OpenCompass - DEBUG - Modules of opencompass's task registry have been automatically imported from opencompass.tasks
01/30 00:18:27 - OpenCompass - DEBUG - Get class OpenICLInferTask from "task" registry in "opencompass"
01/30 00:18:27 - OpenCompass - DEBUG - An OpenICLInferTask instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_infer
01/30 00:18:31 - OpenCompass - INFO - Task [llama-2-70b-hf/ARC-c]
01/30 00:18:32 - OpenCompass - WARNING - pad_token_id is not set for the tokenizer.
01/30 00:18:32 - OpenCompass - WARNING - Using eos_token_id as pad_token_id.

Loading checkpoint shards: 0%| | 0/15 [00:00<?, ?it/s]
Loading checkpoint shards: 7%|▋ | 1/15 [00:01<00:22, 1.61s/it]
Loading checkpoint shards: 13%|█▎ | 2/15 [00:03<00:22, 1.76s/it]
Loading checkpoint shards: 20%|██ | 3/15 [00:05<00:20, 1.72s/it]
Loading checkpoint shards: 27%|██▋ | 4/15 [00:06<00:19, 1.76s/it]
Loading checkpoint shards: 33%|███▎ | 5/15 [00:08<00:17, 1.70s/it]
Loading checkpoint shards: 40%|████ | 6/15 [00:10<00:15, 1.71s/it]
Loading checkpoint shards: 47%|████▋ | 7/15 [00:12<00:13, 1.73s/it]
Loading checkpoint shards: 53%|█████▎ | 8/15 [00:13<00:12, 1.74s/it]
Loading checkpoint shards: 60%|██████ | 9/15 [00:15<00:10, 1.70s/it]
Loading checkpoint shards: 67%|██████▋ | 10/15 [00:17<00:08, 1.66s/it]
Loading checkpoint shards: 73%|███████▎ | 11/15 [00:18<00:06, 1.70s/it]
Loading checkpoint shards: 80%|████████ | 12/15 [00:20<00:05, 1.68s/it]
Loading checkpoint shards: 87%|████████▋ | 13/15 [00:22<00:03, 1.65s/it]
Loading checkpoint shards: 93%|█████████▎| 14/15 [00:23<00:01, 1.63s/it]
Loading checkpoint shards: 100%|██████████| 15/15 [00:23<00:00, 1.20s/it]
Loading checkpoint shards: 100%|██████████| 15/15 [00:23<00:00, 1.59s/it]
01/30 00:18:58 - OpenCompass - INFO - Start inferencing [llama-2-70b-hf/ARC-c]
[2024-01-30 00:18:59,241] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...

0%| | 0/1165 [00:00<?, ?it/s]/xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:362: UserWarning: do_sample is set to False. However, temperature is set to 0.6 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature.
warnings.warn(
/xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:367: UserWarning: do_sample is set to False. However, top_p is set to 0.9 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p.
warnings.warn(
/opt/conda/conda-bld/pytorch_1699449181202/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [17,0,0], thread: [32,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1699449181202/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [17,0,0], thread: [33,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
....
/opt/conda/conda-bld/pytorch_1699449181202/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [63,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.

0%| | 0/1165 [00:00<?, ?it/s]
Traceback (most recent call last):
File "xxx/opencompass/opencompass/tasks/openicl_infer.py", line 153, in
inferencer.run()
File "xxx/opencompass/opencompass/tasks/openicl_infer.py", line 81, in run
self._inference()
File "xxx/opencompass/opencompass/tasks/openicl_infer.py", line 126, in _inference
inferencer.inference(retriever,
File "/xxx/opencompass/opencompass/openicl/icl_inferencer/icl_gen_inferencer.py", line 146, in inference
results = self.model.generate_from_template(
File "xxx/opencompass/opencompass/models/base.py", line 165, in generate_from_template
return self.generate(inputs, max_out_len=max_out_len, **kwargs)
File "xxx/opencompass/opencompass/models/huggingface.py", line 250, in generate
return sum(
File "xxx/opencompass/opencompass/models/huggingface.py", line 251, in
(self.single_generate(inputs=[input],
File "xxx/opencompass/opencompass/models/huggingface.py", line 407, in _single_generate
outputs = self.model.generate(input_ids=input_ids,
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/transformers/generation/utils.py", line 1596, in generate
return self.greedy_search(
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/transformers/generation/utils.py", line 2444, in greedy_search
outputs = self(
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 809, in forward
outputs = self.model(
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 697, in forward
layer_outputs = decoder_layer(
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 413, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 322, in forward
query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 184, in apply_rotary_pos_emb
cos = cos[position_ids].unsqueeze(1) # [bs, 1, seq_len, dim]
RuntimeError: CUDA error: device-side assert triggered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

[2024-01-30 00:19:04,249] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 2906944) of binary: xxx/anaconda3/envs/opencompass/bin/python
Traceback (most recent call last):
File "xxx/anaconda3/envs/opencompass/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==2.1.1', 'console_scripts', 'torchrun')())
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, kwargs)
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/run.py", line 806, in main
run(args)
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call**
return launch_agent(self._config, self._entrypoint, list(args))
File "xxx/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

xxx/evaluation/opencompass/opencompass/tasks/openicl_infer.py FAILED

其他信息

When I use num_gpu=1, there is no error, but when I use num_gpu>1 I get this error.
I suspect it's a problem with the transformers library, but after trying switching through several versions, I still can't fix it.

I want to use llama2 with 70b, I tried meta/llama and I only have 4 GPUs with a capacity of 44G, but it seems that loading meta/llama 70b only allows me to use 8 cards.

I think it's a compatibility issue with your code and the transformers library？

I also tried to avoid this problem by using a non-huggingface mod, I want to use llama2 with 70b, I tried meta/llama and I only have 4 GPUs with 44G capacity, but it seems that loading meta/llama 70b can only use 8 cards.

So, hopefully this bug can be fixed soon

Songyang Zhang · Answer 1 · Mon Feb 19 2024 12:59:38 GMT+0800 (China Standard Time)

We now only support 8GPU for 70B model, maybe you can try vllm as backend for inference.
For example: https://github.com/open-compass/OpenCompass/blob/main/configs/models/wizardlm/vllm_wizardlm_70b_v1_0.py