使用 Accelerate 加速Qwen2多卡推理失败 Failed to inference on multiple GPUs using accelerate
pillowsofwind opened this issue · comments
Rongwu Xu commented
Hello,
I try to use accelerate==0.32.1
to assist fast batch inference on GPU.
However, I encounter the following issue when using multiple-GPU:
...
File "/data/conda_envs/inference/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/conda_envs/inference/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/data/conda_envs/inference/lib/python3.9/site-packages/accelerate/hooks.py", line 169, in new_forward
output = module._old_forward(*args, **kwargs)
File "/data/conda_envs/inference/lib/python3.9/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 1235, in forward
logits = self.lm_head(hidden_states)
File "/data/conda_envs/inference/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data/conda_envs/inference/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/data/conda_envs/inference/lib/python3.9/site-packages/accelerate/hooks.py", line 169, in new_forward
output = module._old_forward(*args, **kwargs)
File "/data/conda_envs/inference/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 116, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [64,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [65,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [66,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
...
It seems the an issue with accelerate/hooks.py
, but I don't know the exact bug here.
My code works fine using a single GPU.
Ren Xuancheng commented
hi, there are similar ones reported and it is likely caused by nvidia driver; please search the issues first.