cuda 11.1 fail?

Question

cuda 11.1 fail?

vison20080808 opened this issue 2 years ago · comments

1、nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Tue_Sep_15_19:10:02_PDT_2020
Cuda compilation tools, release 11.1, V11.1.74
Build cuda_11.1.TC455_06.29069683_0

2、pip list
Package Version Location

detectron2 0.6+cu111
torch 1.10.1+cu111
torchaudio 0.10.1+rocm4.1
torchvision 0.11.2+cu111
tqdm 4.63.1

3、log：
sda/okay/ai/project/ocrquesseg-svr/detectron2/detectron2/structures/boxes.py:158: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:201.)
tensor = torch.as_tensor(tensor, dtype=torch.float32, device=device)
error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device
error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device
error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device
error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device
/sda/program/anaconda3/envs/detectron2/lib/python3.7/site-packages/torch/nn/functional.py:3847: UserWarning: nn.functional.upsample_bilinear is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.functional.upsample_bilinear is deprecated. Use nn.functional.interpolate instead.")
error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device
error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device
error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device
error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device

error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device
error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device
error in modulated_deformable_im2col_cuda: no kernel image is available for execution on the device
THCudaCheck FAIL file=/sda/okay/ai/project/ocrquesseg-svr/dyhead/csrc/cuda/SigmoidFocalLoss_cuda.cu line=139 error=209 : no kernel image is available for execution on the device
ERROR [04/01 18:08:11 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
File "/sda/okay/ai/project/ocrquesseg-svr/detectron2/detectron2/engine/train_loop.py", line 149, in train
self.run_step()
File "/sda/okay/ai/project/ocrquesseg-svr/detectron2/detectron2/engine/defaults.py", line 499, in run_step
self._trainer.run_step()
File "/sda/okay/ai/project/ocrquesseg-svr/detectron2/detectron2/engine/train_loop.py", line 273, in run_step
loss_dict = self.model(data)
File "/sda/program/anaconda3/envs/detectron2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/sda/okay/ai/project/ocrquesseg-svr/extra/atss.py", line 204, in forward
losses = self.losses(anchors, pred_logits, gt_labels, pred_anchor_deltas, gt_boxes, pred_centers)
File "/sda/okay/ai/project/ocrquesseg-svr/extra/atss.py", line 236, in losses
cls_loss = self.classification_loss_func(box_cls_flatten, labels_flatten.int()) / num_pos_avg_per_gpu
File "/sda/program/anaconda3/envs/detectron2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/sda/okay/ai/project/ocrquesseg-svr/extra/sigmoid_focal_loss.py", line 48, in forward
loss = sigmoid_focal_loss_cuda(logits, targets, self.gamma, self.alpha)
File "/sda/okay/ai/project/ocrquesseg-svr/extra/sigmoid_focal_loss.py", line 20, in forward
logits, targets, num_classes, gamma, alpha
RuntimeError: cuda runtime error (209) : no kernel image is available for execution on the device at /sda/okay/ai/project/ocrquesseg-svr/dyhead/csrc/cuda/SigmoidFocalLoss_cuda.cu:139
[04/01 18:08:11 d2.engine.hooks]: Total training time: 0:00:00 (0:00:00 on hooks)
[04/01 18:08:11 d2.utils.events]: iter: 0 lr: N/A max_mem: 6574M
Traceback (most recent call last):
File "train_net.py", line 224, in
args=(args,),
File "/sda/okay/ai/project/ocrquesseg-svr/detectron2/detectron2/engine/launch.py", line 82, in launch
main_func(*args)
File "train_net.py", line 210, in main
return trainer.train()
File "/sda/okay/ai/project/ocrquesseg-svr/detectron2/detectron2/engine/defaults.py", line 489, in train
super().train(self.start_iter, self.max_iter)
File "/sda/okay/ai/project/ocrquesseg-svr/detectron2/detectron2/engine/train_loop.py", line 149, in train
self.run_step()
File "/sda/okay/ai/project/ocrquesseg-svr/detectron2/detectron2/engine/defaults.py", line 499, in run_step
self._trainer.run_step()
File "/sda/okay/ai/project/ocrquesseg-svr/detectron2/detectron2/engine/train_loop.py", line 273, in run_step
loss_dict = self.model(data)
File "/sda/program/anaconda3/envs/detectron2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/sda/okay/ai/project/ocrquesseg-svr/extra/atss.py", line 204, in forward
losses = self.losses(anchors, pred_logits, gt_labels, pred_anchor_deltas, gt_boxes, pred_centers)
File "/sda/okay/ai/project/ocrquesseg-svr/extra/atss.py", line 236, in losses
cls_loss = self.classification_loss_func(box_cls_flatten, labels_flatten.int()) / num_pos_avg_per_gpu
File "/sda/program/anaconda3/envs/detectron2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/sda/okay/ai/project/ocrquesseg-svr/extra/sigmoid_focal_loss.py", line 48, in forward
loss = sigmoid_focal_loss_cuda(logits, targets, self.gamma, self.alpha)
File "/sda/okay/ai/project/ocrquesseg-svr/extra/sigmoid_focal_loss.py", line 20, in forward
logits, targets, num_classes, gamma, alpha
RuntimeError: cuda runtime error (209) : no kernel image is available for execution on the device at /sda/okay/ai/project/ocrquesseg-svr/dyhead/csrc/cuda/SigmoidFocalLoss_cuda.cu:139

ZhanTao · Answer 1 · Wed Apr 06 2022 09:35:19 GMT+0800 (China Standard Time)

conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge

fix~

ZhanTao · Answer 2 · Wed Apr 06 2022 09:36:34 GMT+0800 (China Standard Time)

after conda install pythorch

then:

pip install -e .
pip install -e detectron2