CUDA Error

Question

CUDA Error

Wallong opened this issue 7 months ago · comments

Hi, great work!
When I run nerf_synthetic data I get a CUDA error, is there some configuration that I overlooked that is causing the error?

2023-11-02 09:44:24.958 | INFO     | utils.writer:write_scalar_dicts:79 - lr:0.002592 step:23000 iter_time:0.01472163200378418 ETA:0:00:29 num_alive_ray:13716 rendering_samples_actual:269133 num_rays:39829 PSNR:37.34233474731445 total_loss:0.0007085531251505017 
2023-11-02 09:44:42.329 | INFO     | utils.writer:write_scalar_dicts:79 - lr:0.002592 step:24000 iter_time:0.012811899185180664 ETA:0:00:12 num_alive_ray:13679 rendering_samples_actual:261635 num_rays:40487 PSNR:37.36977005004883 total_loss:0.0007570512825623155 
2023-11-02 09:44:59.344 | INFO     | utils.writer:write_scalar_dicts:79 - lr:0.002592 step:25000 iter_time:0.01546168327331543 ETA:0:00:00 num_alive_ray:13266 rendering_samples_actual:263475 num_rays:38886 PSNR:37.54179382324219 total_loss:0.0006864252500236034 
Traceback (most recent call last):
  File "main.py", line 96, in <module>
    main()
  File "/home/wll/miniconda3/envs/nerf/lib/python3.8/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/wll/miniconda3/envs/nerf/lib/python3.8/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/wll/miniconda3/envs/nerf/lib/python3.8/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "main.py", line 56, in main
    trainer.fit()
  File "/home/wll/workspace/nerf/Tri-MipRF/trainer/trainer.py", line 140, in fit
    metrics, final_rb, target = self.eval_img(
  File "/home/wll/miniconda3/envs/nerf/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/wll/workspace/nerf/Tri-MipRF/trainer/trainer.py", line 168, in eval_img
    rb = self.model(
  File "/home/wll/miniconda3/envs/nerf/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/wll/workspace/nerf/Tri-MipRF/neural_field/model/trimipRF.py", line 118, in forward
    return self.rendering(
  File "/home/wll/workspace/nerf/Tri-MipRF/neural_field/model/trimipRF.py", line 140, in rendering
    rgbs, sigmas = rgb_sigma_fn(t_starts, t_ends, ray_indices.long())
  File "/home/wll/workspace/nerf/Tri-MipRF/neural_field/model/trimipRF.py", line 115, in rgb_sigma_fn
    rgb = self.field.query_rgb(dir=t_dirs, embedding=feature)['rgb']
  File "/home/wll/workspace/nerf/Tri-MipRF/neural_field/field/trimipRF.py", line 97, in query_rgb
    self.mlp_head(h)
  File "/home/wll/miniconda3/envs/nerf/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/wll/miniconda3/envs/nerf/lib/python3.8/site-packages/tinycudann-1.7-py3.8-linux-x86_64.egg/tinycudann/modules.py", line 189, in forward
    self.params.to(_torch_precision(self.native_tcnn_module.param_precision())).contiguous(),
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
  In call to configurable 'main' (<function main at 0x7fa137334700>)

hrz · Answer 1 · Mon Nov 06 2023 22:57:02 GMT+0800 (China Standard Time)

the same bug，waiting for a solution

Hu Zhu · Answer 2 · Fri Nov 24 2023 10:19:22 GMT+0800 (China Standard Time)

same error...

Xiang, Le · Answer 3 · Fri Nov 24 2023 22:07:39 GMT+0800 (China Standard Time)

Firstly, thanks for your valuable work. I met the same error, my Pytorch version:2.0.1, Python version:3.9.0, Cuda version: 11.7, Tinycudann version: 1.7, GPU:3090. Is the incompatibility problem? Hope you can help us fix it, thanks a lot. Besides, how can I assign a specific GPU to run the task?

Xiang, Le · Answer 4 · Tue Nov 28 2023 21:18:24 GMT+0800 (China Standard Time)

Firstly, thanks for your valuable work. I met the same error, my Pytorch version:2.0.1, Python version:3.9.0, Cuda version: 11.7, Tinycudann version: 1.7, GPU:3090. Is the incompatibility problem? Hope you can help us fix it, thanks a lot. Besides, how can I assign a specific GPU to run the task?

It seems caused by the incompatibility of Tinycudann with Cuda runtime version. It works for me with Python:3.9.0, Pytorch:1.13.1, Cuda:11.7, Tinycudann:1.7. Hope it can help you fix the errors.

Hu Wenbo · Answer 5 · Thu Dec 21 2023 16:16:45 GMT+0800 (China Standard Time)

closed as solved

TerryYang · Answer 6 · Thu Jan 11 2024 23:01:36 GMT+0800 (China Standard Time)

Firstly, thanks for your valuable work. I met the same error, my Pytorch version:2.0.1, Python version:3.9.0, Cuda version: 11.7, Tinycudann version: 1.7, GPU:3090. Is the incompatibility problem? Hope you can help us fix it, thanks a lot. Besides, how can I assign a specific GPU to run the task?

It seems caused by the incompatibility of Tinycudann with Cuda runtime version. It works for me with Python:3.9.0, Pytorch:1.13.1, Cuda:11.7, Tinycudann:1.7. Hope it can help you fix the errors.

I've encountered the following problem. Is it because of the version of tinycudann? My version is the same as yours. Did your code run successfully?

# Parameters for TriMipRF:
# ==============================================================================
TriMipRF.feature_dim = 16
TriMipRF.geo_feat_dim = 15
TriMipRF.n_levels = 8
TriMipRF.net_depth_base = 2
TriMipRF.net_depth_color = 4
TriMipRF.net_width = 128
TriMipRF.plane_size = 512

# Parameters for TriMipRFModel:
# ==============================================================================
TriMipRFModel.occ_grid_resolution = 128
TriMipRFModel.samples_per_ray = 1024

2024-01-12 14:33:35.438 | INFO     | trainer.trainer:fit:106 - ==> Start training ...

NerfAcc: No CUDA toolkit found. NerfAcc will be disabled.
Traceback (most recent call last):
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/main.py", line 99, in <module>
    main()
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/main.py", line 56, in main
    trainer.fit()
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/trainer/trainer.py", line 113, in fit
    self.model.before_iter(step)
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/neural_field/model/trimipRF.py", line 41, in before_iter
    self.ray_sampler.every_n_step(
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/grid.py", line 271, in every_n_step
    self._update(
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/grid.py", line 224, in _update
    x = contract_inv(
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/contraction.py", line 101, in contract_inv
    ctype = type.to_cpp_version()
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/contraction.py", line 62, in to_cpp_version
    return _C.ContractionTypeGetter(self.value)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/cuda/__init__.py", line 13, in call_cuda
    return getattr(_C, name)(*args, **kwargs)
AttributeError: 'NoneType' object has no attribute 'ContractionType'
  In call to configurable 'main' (<function main at 0x7fd229f57b80>)

Xiang, Le · Answer 7 · Thu Jan 11 2024 23:14:43 GMT+0800 (China Standard Time)

Firstly, thanks for your valuable work. I met the same error, my Pytorch version:2.0.1, Python version:3.9.0, Cuda version: 11.7, Tinycudann version: 1.7, GPU:3090. Is the incompatibility problem? Hope you can help us fix it, thanks a lot. Besides, how can I assign a specific GPU to run the task?

It seems caused by the incompatibility of Tinycudann with Cuda runtime version. It works for me with Python:3.9.0, Pytorch:1.13.1, Cuda:11.7, Tinycudann:1.7. Hope it can help you fix the errors.

I've encountered the following problem. Is it because of the version of tinycudann? My version is the same as yours. Did your code run successfully?

# Parameters for TriMipRF:
# ==============================================================================
TriMipRF.feature_dim = 16
TriMipRF.geo_feat_dim = 15
TriMipRF.n_levels = 8
TriMipRF.net_depth_base = 2
TriMipRF.net_depth_color = 4
TriMipRF.net_width = 128
TriMipRF.plane_size = 512

# Parameters for TriMipRFModel:
# ==============================================================================
TriMipRFModel.occ_grid_resolution = 128
TriMipRFModel.samples_per_ray = 1024

2024-01-12 14:33:35.438 | INFO     | trainer.trainer:fit:106 - ==> Start training ...

NerfAcc: No CUDA toolkit found. NerfAcc will be disabled.
Traceback (most recent call last):
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/main.py", line 99, in <module>
    main()
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/main.py", line 56, in main
    trainer.fit()
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/trainer/trainer.py", line 113, in fit
    self.model.before_iter(step)
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/neural_field/model/trimipRF.py", line 41, in before_iter
    self.ray_sampler.every_n_step(
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/grid.py", line 271, in every_n_step
    self._update(
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/grid.py", line 224, in _update
    x = contract_inv(
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/contraction.py", line 101, in contract_inv
    ctype = type.to_cpp_version()
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/contraction.py", line 62, in to_cpp_version
    return _C.ContractionTypeGetter(self.value)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/cuda/__init__.py", line 13, in call_cuda
    return getattr(_C, name)(*args, **kwargs)
AttributeError: 'NoneType' object has no attribute 'ContractionType'
  In call to configurable 'main' (<function main at 0x7fd229f57b80>)

Yes, the above version works in my 3090. Your problem seemingly caused by NerfAcc, perhaps you didn't install Cuda toolkit or didn't add it's path to your system. You can try "nvcc --version" test if it had been added.

TerryYang · Answer 8 · Fri Jan 12 2024 10:17:45 GMT+0800 (China Standard Time)

Firstly, thanks for your valuable work. I met the same error, my Pytorch version:2.0.1, Python version:3.9.0, Cuda version: 11.7, Tinycudann version: 1.7, GPU:3090. Is the incompatibility problem? Hope you can help us fix it, thanks a lot. Besides, how can I assign a specific GPU to run the task?

It seems caused by the incompatibility of Tinycudann with Cuda runtime version. It works for me with Python:3.9.0, Pytorch:1.13.1, Cuda:11.7, Tinycudann:1.7. Hope it can help you fix the errors.

I've encountered the following problem. Is it because of the version of tinycudann? My version is the same as yours. Did your code run successfully?

# Parameters for TriMipRF:
# ==============================================================================
TriMipRF.feature_dim = 16
TriMipRF.geo_feat_dim = 15
TriMipRF.n_levels = 8
TriMipRF.net_depth_base = 2
TriMipRF.net_depth_color = 4
TriMipRF.net_width = 128
TriMipRF.plane_size = 512

# Parameters for TriMipRFModel:
# ==============================================================================
TriMipRFModel.occ_grid_resolution = 128
TriMipRFModel.samples_per_ray = 1024

2024-01-12 14:33:35.438 | INFO     | trainer.trainer:fit:106 - ==> Start training ...

NerfAcc: No CUDA toolkit found. NerfAcc will be disabled.
Traceback (most recent call last):
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/main.py", line 99, in <module>
    main()
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/main.py", line 56, in main
    trainer.fit()
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/trainer/trainer.py", line 113, in fit
    self.model.before_iter(step)
  File "/media/yangtongyu/T9/code1/Tri-MipRF-main/neural_field/model/trimipRF.py", line 41, in before_iter
    self.ray_sampler.every_n_step(
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/grid.py", line 271, in every_n_step
    self._update(
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/grid.py", line 224, in _update
    x = contract_inv(
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/contraction.py", line 101, in contract_inv
    ctype = type.to_cpp_version()
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/contraction.py", line 62, in to_cpp_version
    return _C.ContractionTypeGetter(self.value)
  File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/cuda/__init__.py", line 13, in call_cuda
    return getattr(_C, name)(*args, **kwargs)
AttributeError: 'NoneType' object has no attribute 'ContractionType'
  In call to configurable 'main' (<function main at 0x7fd229f57b80>)

Yes, the above version works in my 3090. Your problem seemingly caused by NerfAcc, perhaps you didn't install Cuda toolkit or didn't add it's path to your system. You can try "nvcc --version" test if it had been added.

Thank you so much for your kind reply! It is because the path of nvcc is not found. My problem solved!