no kernel image is available for execution on the device atsrc/sigmoid_focal_loss_cuda.cu:128

Question

no kernel image is available for execution on the device atsrc/sigmoid_focal_loss_cuda.cu:128

rivercn opened this issue 3 years ago · comments

### 硬件环境：

1，cnetos7 服务器
2，官方cuda版本 CUDA10.0
3，conda 运行环境 python3 pytorch1.4 cudatoolkit10.1

(PSLI) [zhusong@localhost SOLO]$ nvidia-smi
Fri Aug 13 10:56:36 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.57 Driver Version: 450.57 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 TITAN Xp Off | 00000000:02:00.0 Off | N/A |
| 29% 45C P0 63W / 250W | 0MiB / 12196MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 TITAN Xp Off | 00000000:03:00.0 Off | N/A |
| 33% 47C P0 63W / 250W | 0MiB / 12196MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 TITAN Xp Off | 00000000:82:00.0 Off | N/A |
| 32% 46C P0 60W / 250W | 0MiB / 12196MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 TITAN Xp Off | 00000000:83:00.0 Off | N/A |
| 35% 49C P0 59W / 250W | 0MiB / 12196MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

### 错误内容：

oading annotations into memory...
Done (t=13.39s)
creating index...
index created!
2021-08-13 10:47:28,953 - mmdet - INFO - Start running, host: zhusong@localhost.localdomain, work_dir: /home/zhusong/project/SOLO/work_dirs/decoupled_solo_light_release_r50_fpn_8gpu_3x
2021-08-13 10:47:28,953 - mmdet - INFO - workflow: [('train', 1)], max: 36 epochs
/home/zhusong/.conda/envs/PSLI/lib/python3.7/site-packages/torch/nn/functional.py:2506: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
THCudaCheck FAIL file=mmdet/ops/sigmoid_focal_loss/src/sigmoid_focal_loss_cuda.cu line=128 error=209 : no kernel image is available for execution on the device
Traceback (most recent call last):
File "tools/train.py", line 125, in
main()
File "tools/train.py", line 121, in main
timestamp=timestamp)
File "/home/zhusong/project/SOLO/mmdet/apis/train.py", line 111, in train_detector
timestamp=timestamp)
File "/home/zhusong/project/SOLO/mmdet/apis/train.py", line 297, in _non_dist_train
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/home/zhusong/.conda/envs/PSLI/lib/python3.7/site-packages/mmcv-0.2.16-py3.7-linux-x86_64.egg/mmcv/runner/runner.py", line 364, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/zhusong/.conda/envs/PSLI/lib/python3.7/site-packages/mmcv-0.2.16-py3.7-linux-x86_64.egg/mmcv/runner/runner.py", line 268, in train
self.model, data_batch, train_mode=True, **kwargs)
File "/home/zhusong/project/SOLO/mmdet/apis/train.py", line 78, in batch_processor
losses = model(**data)
File "/home/zhusong/.conda/envs/PSLI/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zhusong/.conda/envs/PSLI/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/zhusong/.conda/envs/PSLI/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zhusong/project/SOLO/mmdet/core/fp16/decorators.py", line 49, in new_func
return old_func(*args, **kwargs)
File "/home/zhusong/project/SOLO/mmdet/models/detectors/base.py", line 142, in forward
return self.forward_train(img, img_meta, **kwargs)
File "/home/zhusong/project/SOLO/mmdet/models/detectors/single_stage_ins.py", line 78, in forward_train
*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
File "/home/zhusong/project/SOLO/mmdet/models/anchor_heads/decoupled_solo_light_head.py", line 258, in loss
loss_cate = self.loss_cate(flatten_cate_preds, flatten_cate_labels, avg_factor=num_ins + 1)
File "/home/zhusong/.conda/envs/PSLI/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zhusong/project/SOLO/mmdet/models/losses/focal_loss.py", line 79, in forward
avg_factor=avg_factor)
File "/home/zhusong/project/SOLO/mmdet/models/losses/focal_loss.py", line 37, in sigmoid_focal_loss
loss = _sigmoid_focal_loss(pred, target, gamma, alpha)
File "/home/zhusong/project/SOLO/mmdet/ops/sigmoid_focal_loss/sigmoid_focal_loss.py", line 19, in forward
gamma, alpha)
RuntimeError: cuda runtime error (209) : no kernel image is available for execution on the device at mmdet/ops/sigmoid_focal_loss/src/sigmoid_focal_loss_cuda.cu:128
#44

验证过程参考，运行例子代码可以正常运行
https://heary.cn/posts/PyTorch%E6%8A%A5CUDA-error-no-kernel-image-is-available-for-execution-on-the-device%E9%97%AE%E9%A2%98%E8%A7%A3%E5%86%B3/
低版本pytorch还未测试，是否在/sigmoid_focal_loss_cuda.cu中有其它解决方案，看着像是focal loss的两个超参数计算问题

Raymond · Answer 1 · Tue Dec 07 2021 16:28:11 GMT+0800 (China Standard Time)

same error plz

Raymond · Answer 2 · Mon Dec 13 2021 11:37:30 GMT+0800 (China Standard Time)

I fixed it.
Due to I changed the environment (GPU1080 to A4000),caused this error.
Just remove the build file under SOLOv2, and rebuild it.

dragonhaha · Answer 3 · Fri Feb 25 2022 13:48:00 GMT+0800 (China Standard Time)

Colab上的环境同样遇到了该问题，按照楼上朋友的方法依然没有解决。
我把mmdet/ops/sigmoid_focal_loss/src/sigmoid_focal_loss.cpp以及mmdet/ops/sigmoid_focal_loss/src/sigmoid_focal_loss.cpp
全部改为了以下链接的内容，然后重新build mmdet，最后成功跑起来了。
link

附环境：
CUDA available: True
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.1.TC455_06.29190527_0
GPU 0: Tesla P100-PCIE-16GB
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.10.0+cu111
PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
CuDNN 8.0.5
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.11.1+cu111
OpenCV: 4.1.2
MMCV: 0.2.16
MMDetection: 1.0.0+95f3732
MMDetection Compiler: GCC 7.5
MMDetection CUDA Compiler: 11.1