CUDA Error
Wallong opened this issue · comments
Hi, great work!
When I run nerf_synthetic data I get a CUDA error, is there some configuration that I overlooked that is causing the error?
2023-11-02 09:44:24.958 | INFO | utils.writer:write_scalar_dicts:79 - lr:0.002592 step:23000 iter_time:0.01472163200378418 ETA:0:00:29 num_alive_ray:13716 rendering_samples_actual:269133 num_rays:39829 PSNR:37.34233474731445 total_loss:0.0007085531251505017
2023-11-02 09:44:42.329 | INFO | utils.writer:write_scalar_dicts:79 - lr:0.002592 step:24000 iter_time:0.012811899185180664 ETA:0:00:12 num_alive_ray:13679 rendering_samples_actual:261635 num_rays:40487 PSNR:37.36977005004883 total_loss:0.0007570512825623155
2023-11-02 09:44:59.344 | INFO | utils.writer:write_scalar_dicts:79 - lr:0.002592 step:25000 iter_time:0.01546168327331543 ETA:0:00:00 num_alive_ray:13266 rendering_samples_actual:263475 num_rays:38886 PSNR:37.54179382324219 total_loss:0.0006864252500236034
Traceback (most recent call last):
File "main.py", line 96, in <module>
main()
File "/home/wll/miniconda3/envs/nerf/lib/python3.8/site-packages/gin/config.py", line 1605, in gin_wrapper
utils.augment_exception_message_and_reraise(e, err_str)
File "/home/wll/miniconda3/envs/nerf/lib/python3.8/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
raise proxy.with_traceback(exception.__traceback__) from None
File "/home/wll/miniconda3/envs/nerf/lib/python3.8/site-packages/gin/config.py", line 1582, in gin_wrapper
return fn(*new_args, **new_kwargs)
File "main.py", line 56, in main
trainer.fit()
File "/home/wll/workspace/nerf/Tri-MipRF/trainer/trainer.py", line 140, in fit
metrics, final_rb, target = self.eval_img(
File "/home/wll/miniconda3/envs/nerf/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/wll/workspace/nerf/Tri-MipRF/trainer/trainer.py", line 168, in eval_img
rb = self.model(
File "/home/wll/miniconda3/envs/nerf/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/wll/workspace/nerf/Tri-MipRF/neural_field/model/trimipRF.py", line 118, in forward
return self.rendering(
File "/home/wll/workspace/nerf/Tri-MipRF/neural_field/model/trimipRF.py", line 140, in rendering
rgbs, sigmas = rgb_sigma_fn(t_starts, t_ends, ray_indices.long())
File "/home/wll/workspace/nerf/Tri-MipRF/neural_field/model/trimipRF.py", line 115, in rgb_sigma_fn
rgb = self.field.query_rgb(dir=t_dirs, embedding=feature)['rgb']
File "/home/wll/workspace/nerf/Tri-MipRF/neural_field/field/trimipRF.py", line 97, in query_rgb
self.mlp_head(h)
File "/home/wll/miniconda3/envs/nerf/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/wll/miniconda3/envs/nerf/lib/python3.8/site-packages/tinycudann-1.7-py3.8-linux-x86_64.egg/tinycudann/modules.py", line 189, in forward
self.params.to(_torch_precision(self.native_tcnn_module.param_precision())).contiguous(),
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
In call to configurable 'main' (<function main at 0x7fa137334700>)
the same bug,waiting for a solution
same error...
Firstly, thanks for your valuable work. I met the same error, my Pytorch version:2.0.1, Python version:3.9.0, Cuda version: 11.7, Tinycudann version: 1.7, GPU:3090. Is the incompatibility problem? Hope you can help us fix it, thanks a lot. Besides, how can I assign a specific GPU to run the task?
Firstly, thanks for your valuable work. I met the same error, my Pytorch version:2.0.1, Python version:3.9.0, Cuda version: 11.7, Tinycudann version: 1.7, GPU:3090. Is the incompatibility problem? Hope you can help us fix it, thanks a lot. Besides, how can I assign a specific GPU to run the task?
It seems caused by the incompatibility of Tinycudann with Cuda runtime version. It works for me with Python:3.9.0, Pytorch:1.13.1, Cuda:11.7, Tinycudann:1.7. Hope it can help you fix the errors.
closed as solved
Firstly, thanks for your valuable work. I met the same error, my Pytorch version:2.0.1, Python version:3.9.0, Cuda version: 11.7, Tinycudann version: 1.7, GPU:3090. Is the incompatibility problem? Hope you can help us fix it, thanks a lot. Besides, how can I assign a specific GPU to run the task?
It seems caused by the incompatibility of Tinycudann with Cuda runtime version. It works for me with Python:3.9.0, Pytorch:1.13.1, Cuda:11.7, Tinycudann:1.7. Hope it can help you fix the errors.
I've encountered the following problem. Is it because of the version of tinycudann? My version is the same as yours. Did your code run successfully?
# Parameters for TriMipRF:
# ==============================================================================
TriMipRF.feature_dim = 16
TriMipRF.geo_feat_dim = 15
TriMipRF.n_levels = 8
TriMipRF.net_depth_base = 2
TriMipRF.net_depth_color = 4
TriMipRF.net_width = 128
TriMipRF.plane_size = 512
# Parameters for TriMipRFModel:
# ==============================================================================
TriMipRFModel.occ_grid_resolution = 128
TriMipRFModel.samples_per_ray = 1024
2024-01-12 14:33:35.438 | INFO | trainer.trainer:fit:106 - ==> Start training ...
NerfAcc: No CUDA toolkit found. NerfAcc will be disabled.
Traceback (most recent call last):
File "/media/yangtongyu/T9/code1/Tri-MipRF-main/main.py", line 99, in <module>
main()
File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/config.py", line 1605, in gin_wrapper
utils.augment_exception_message_and_reraise(e, err_str)
File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
raise proxy.with_traceback(exception.__traceback__) from None
File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/config.py", line 1582, in gin_wrapper
return fn(*new_args, **new_kwargs)
File "/media/yangtongyu/T9/code1/Tri-MipRF-main/main.py", line 56, in main
trainer.fit()
File "/media/yangtongyu/T9/code1/Tri-MipRF-main/trainer/trainer.py", line 113, in fit
self.model.before_iter(step)
File "/media/yangtongyu/T9/code1/Tri-MipRF-main/neural_field/model/trimipRF.py", line 41, in before_iter
self.ray_sampler.every_n_step(
File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/grid.py", line 271, in every_n_step
self._update(
File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/grid.py", line 224, in _update
x = contract_inv(
File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/contraction.py", line 101, in contract_inv
ctype = type.to_cpp_version()
File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/contraction.py", line 62, in to_cpp_version
return _C.ContractionTypeGetter(self.value)
File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/cuda/__init__.py", line 13, in call_cuda
return getattr(_C, name)(*args, **kwargs)
AttributeError: 'NoneType' object has no attribute 'ContractionType'
In call to configurable 'main' (<function main at 0x7fd229f57b80>)
Firstly, thanks for your valuable work. I met the same error, my Pytorch version:2.0.1, Python version:3.9.0, Cuda version: 11.7, Tinycudann version: 1.7, GPU:3090. Is the incompatibility problem? Hope you can help us fix it, thanks a lot. Besides, how can I assign a specific GPU to run the task?
It seems caused by the incompatibility of Tinycudann with Cuda runtime version. It works for me with Python:3.9.0, Pytorch:1.13.1, Cuda:11.7, Tinycudann:1.7. Hope it can help you fix the errors.
I've encountered the following problem. Is it because of the version of tinycudann? My version is the same as yours. Did your code run successfully?
# Parameters for TriMipRF: # ============================================================================== TriMipRF.feature_dim = 16 TriMipRF.geo_feat_dim = 15 TriMipRF.n_levels = 8 TriMipRF.net_depth_base = 2 TriMipRF.net_depth_color = 4 TriMipRF.net_width = 128 TriMipRF.plane_size = 512 # Parameters for TriMipRFModel: # ============================================================================== TriMipRFModel.occ_grid_resolution = 128 TriMipRFModel.samples_per_ray = 1024 2024-01-12 14:33:35.438 | INFO | trainer.trainer:fit:106 - ==> Start training ... NerfAcc: No CUDA toolkit found. NerfAcc will be disabled. Traceback (most recent call last): File "/media/yangtongyu/T9/code1/Tri-MipRF-main/main.py", line 99, in <module> main() File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/config.py", line 1605, in gin_wrapper utils.augment_exception_message_and_reraise(e, err_str) File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise raise proxy.with_traceback(exception.__traceback__) from None File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/config.py", line 1582, in gin_wrapper return fn(*new_args, **new_kwargs) File "/media/yangtongyu/T9/code1/Tri-MipRF-main/main.py", line 56, in main trainer.fit() File "/media/yangtongyu/T9/code1/Tri-MipRF-main/trainer/trainer.py", line 113, in fit self.model.before_iter(step) File "/media/yangtongyu/T9/code1/Tri-MipRF-main/neural_field/model/trimipRF.py", line 41, in before_iter self.ray_sampler.every_n_step( File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/grid.py", line 271, in every_n_step self._update( File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/grid.py", line 224, in _update x = contract_inv( File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/contraction.py", line 101, in contract_inv ctype = type.to_cpp_version() File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/contraction.py", line 62, in to_cpp_version return _C.ContractionTypeGetter(self.value) File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/cuda/__init__.py", line 13, in call_cuda return getattr(_C, name)(*args, **kwargs) AttributeError: 'NoneType' object has no attribute 'ContractionType' In call to configurable 'main' (<function main at 0x7fd229f57b80>)
Yes, the above version works in my 3090. Your problem seemingly caused by NerfAcc, perhaps you didn't install Cuda toolkit or didn't add it's path to your system. You can try "nvcc --version" test if it had been added.
Firstly, thanks for your valuable work. I met the same error, my Pytorch version:2.0.1, Python version:3.9.0, Cuda version: 11.7, Tinycudann version: 1.7, GPU:3090. Is the incompatibility problem? Hope you can help us fix it, thanks a lot. Besides, how can I assign a specific GPU to run the task?
It seems caused by the incompatibility of Tinycudann with Cuda runtime version. It works for me with Python:3.9.0, Pytorch:1.13.1, Cuda:11.7, Tinycudann:1.7. Hope it can help you fix the errors.
I've encountered the following problem. Is it because of the version of tinycudann? My version is the same as yours. Did your code run successfully?
# Parameters for TriMipRF: # ============================================================================== TriMipRF.feature_dim = 16 TriMipRF.geo_feat_dim = 15 TriMipRF.n_levels = 8 TriMipRF.net_depth_base = 2 TriMipRF.net_depth_color = 4 TriMipRF.net_width = 128 TriMipRF.plane_size = 512 # Parameters for TriMipRFModel: # ============================================================================== TriMipRFModel.occ_grid_resolution = 128 TriMipRFModel.samples_per_ray = 1024 2024-01-12 14:33:35.438 | INFO | trainer.trainer:fit:106 - ==> Start training ... NerfAcc: No CUDA toolkit found. NerfAcc will be disabled. Traceback (most recent call last): File "/media/yangtongyu/T9/code1/Tri-MipRF-main/main.py", line 99, in <module> main() File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/config.py", line 1605, in gin_wrapper utils.augment_exception_message_and_reraise(e, err_str) File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise raise proxy.with_traceback(exception.__traceback__) from None File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/gin/config.py", line 1582, in gin_wrapper return fn(*new_args, **new_kwargs) File "/media/yangtongyu/T9/code1/Tri-MipRF-main/main.py", line 56, in main trainer.fit() File "/media/yangtongyu/T9/code1/Tri-MipRF-main/trainer/trainer.py", line 113, in fit self.model.before_iter(step) File "/media/yangtongyu/T9/code1/Tri-MipRF-main/neural_field/model/trimipRF.py", line 41, in before_iter self.ray_sampler.every_n_step( File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/grid.py", line 271, in every_n_step self._update( File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/grid.py", line 224, in _update x = contract_inv( File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/contraction.py", line 101, in contract_inv ctype = type.to_cpp_version() File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/contraction.py", line 62, in to_cpp_version return _C.ContractionTypeGetter(self.value) File "/home/yangtongyu/software/anaconda3/envs/trimip/lib/python3.9/site-packages/nerfacc/cuda/__init__.py", line 13, in call_cuda return getattr(_C, name)(*args, **kwargs) AttributeError: 'NoneType' object has no attribute 'ContractionType' In call to configurable 'main' (<function main at 0x7fd229f57b80>)
Yes, the above version works in my 3090. Your problem seemingly caused by NerfAcc, perhaps you didn't install Cuda toolkit or didn't add it's path to your system. You can try "nvcc --version" test if it had been added.
Thank you so much for your kind reply! It is because the path of nvcc is not found. My problem solved!