[BUG] CUDA memory error when batch_size is 8
hua0x522 opened this issue · comments
Is there an existing issue for this?
- I have searched the existing issues
Current Behavior
I've tried to run the evaluate.py in AE of TorchSparse++, which can indice the flag 'batch_size'. However, if I set the batch_size >= 8, it will report "CUDA error: an illegal memory access was encountered". If the batch_size is 1 to 6, it can execute normally.
the error log is:
Traceback (most recent call last):
File "evaluate.py", line 333, in <module>
main()
File "evaluate.py", line 250, in main
_ = model(inputs["pts_input"])
File "/home/wangxuezhu/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wangxuezhu/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wangxuezhu/torchsparse/evaluation/core/models/segmentation_models/minkunet.py", line 104, in forward
x1 = self.stage1(x0)
File "/home/wangxuezhu/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wangxuezhu/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wangxuezhu/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/nn/modules/container.py", line 215, in forward
input = module(input)
File "/home/wangxuezhu/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wangxuezhu/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wangxuezhu/torchsparse/evaluation/core/models/modules/layers_3d.py", line 42, in forward
out = self.net(x)
File "/home/wangxuezhu/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wangxuezhu/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wangxuezhu/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/nn/modules/container.py", line 215, in forward
input = module(input)
File "/home/wangxuezhu/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wangxuezhu/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wangxuezhu/code/torchsparse/torchsparse/nn/modules/activation.py", line 11, in forward
return fapply(input, super().forward)
File "/home/wangxuezhu/code/torchsparse/torchsparse/nn/utils/apply.py", line 13, in fapply
feats = fn(input.feats, *args, **kwargs)
File "/home/wangxuezhu/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/nn/modules/activation.py", line 101, in forward
return F.relu(input, inplace=self.inplace)
File "/home/wangxuezhu/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/nn/functional.py", line 1469, in relu
result = torch.relu_(input)
RuntimeError: CUDA error: an illegal memory access was encountered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Expected Behavior
No response
Environment
- GCC:11.4.0
- NVCC:12.3
- PyTorch:2.1.2
- PyTorch CUDA:12.1
- TorchSparse:2.1.0
Anything else?
No response
@ys-2020, could you please take a look at this issue when you have time? Thanks!