RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR
zxy630 opened this issue · comments
I try to use ‘./deeplesion/eval.sh ./deeplesion/mconfigs/densenet_a3d.py ./deeplesion/model_weights/adap_7slice_weigts.pth’ but I get this wrong information. It's been bothering me for days......
Here is the info
'''
./deeplesion/mconfigs/densenet_a3d.py
a3d 7 slice
[ ] 0/160, elapsed: 0s, ETA:Traceback (most recent call last):
File "./deeplesion/eval.py", line 210, in
main(checkpoint, cfg_path)
File "./deeplesion/eval.py", line 196, in main
outputs = single_gpu_test(model, dl)
File "./deeplesion/eval.py", line 101, in single_gpu_test
r = model(return_loss=False, rescale=False, **data)
File "/disk/user/zxy/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/disk/user/zxy/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], **kwargs[0])
File "/disk/user/zxy/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/disk/user/zxy/project/AlignShift/mmdet/core/fp16/decorators.py", line 49, in new_func
return old_func(*args, **kwargs)
File "/disk/user/zxy/project/AlignShift/mmdet/models/detectors/base.py", line 122, in forward
return self.forward_test(img, img_meta, **kwargs)
File "/disk/user/zxy/project/AlignShift/mmdet/models/detectors/base.py", line 105, in forward_test
return self.simple_test(imgs, img_metas, **kwargs)
File "/disk/user/zxy/project/AlignShift/mmdet/models/detectors/two_stage.py", line 268, in simple_test
x = self.extract_feat(img)
File "/disk/user/zxy/project/AlignShift/mmdet/models/detectors/two_stage.py", line 92, in extract_feat
x = self.backbone(img)
File "/disk/user/zxy/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/disk/user/zxy/project/AlignShift/nn/models/truncated_densenet3d_a3d.py", line 168, in forward
x = self.conv0(x)
File "/disk/user/zxy/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/disk/user/zxy/project/AlignShift/nn/operators/a3dconv.py", line 59, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR
'''
Hope your suggestions, thanks so much.
My environment: PyTorch=1.3.1, torchvision=0.4.2, cuda=10.1.243, test in 3090 with 4 GPUs.
Hi,
It seems a cuda error, maybe caused by corrupted pytorch environment. you can try run a single conv module to check if the environment is in good condition. Reinstall pytorch may solve this, if thats the case.
I have tried torch=1.3.1, 1.5.0, 1.7.1, 1.8.0 and still existed problems like this case.
I wonder which version you test, incluing torch, CUDA, GPU if convenient.
Thanks.
The traceback you provided shows that torch cant run conv module sucessfully. So try run single conv module to see if torch works, just like this:
import torch
conv = torch.nn.Conv2d(4, 16, 3).cuda()
x = torch.rand(2, 4, 128, 128) .cuda()# B,C,W,H
y = conv(x)
Excuse.
eval is well done, but when i train, it happened error.
'''
Traceback (most recent call last):
File "./deeplesion/train_dist.py", line 121, in
main(args)
File "./deeplesion/train_dist.py", line 116, in main
logger=logger)
File "/home/zhangyi/workplace/AlignShiftv2/mmdet/apis/train.py", line 68, in train_detector
_dist_train(model, dataset, cfg, validate=validate)
File "/home/zhangyi/workplace/AlignShiftv2/mmdet/apis/train.py", line 204, in _dist_train
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/home/zhangyi/anaconda3/envs/a3d/lib/python3.6/site-packages/mmcv/runner/runner.py", line 358, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/zhangyi/anaconda3/envs/a3d/lib/python3.6/site-packages/mmcv/runner/runner.py", line 260, in train
for i, data_batch in enumerate(data_loader):
File "/home/zhangyi/anaconda3/envs/a3d/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 346, in next
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/zhangyi/anaconda3/envs/a3d/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/zhangyi/anaconda3/envs/a3d/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/zhangyi/workplace/AlignShiftv2/deeplesion/dataset/DeepLesionDataset_a3d.py", line 110, in getitem
results = self.pre_pipeline(results)
File "/home/zhangyi/workplace/AlignShiftv2/mmdet/datasets/pipelines/compose.py", line 24, in call
data1 = t(data)
File "/home/zhangyi/workplace/AlignShiftv2/mmdet/datasets/pipelines/transforms.py", line 817, in call
results = self.aug(**results)
File "/home/zhangyi/anaconda3/envs/a3d/lib/python3.6/site-packages/albumentations/core/composition.py", line 158, in call
data = t(force_apply=force_apply, **data)
File "/home/zhangyi/anaconda3/envs/a3d/lib/python3.6/site-packages/albumentations/core/transforms_interface.py", line 65, in call
res[key] = target_function(arg, **dict(params, **target_dependencies))
File "/home/zhangyi/anaconda3/envs/a3d/lib/python3.6/site-packages/albumentations/augmentations/transforms.py", line 513, in apply
return F.shift_scale_rotate(img, angle, scale, dx, dy, interpolation, self.border_mode, self.value)
File "/home/zhangyi/anaconda3/envs/a3d/lib/python3.6/site-packages/albumentations/augmentations/functional.py", line 58, in wrapped_function
result = func(img, *args, **kwargs)
File "/home/zhangyi/anaconda3/envs/a3d/lib/python3.6/site-packages/albumentations/augmentations/functional.py", line 168, in shift_scale_rotate
img = cv2.warpAffine(img, matrix, (width, height), flags=interpolation, borderMode=border_mode, borderValue=value)
cv2.error: OpenCV(4.1.0) /io/opencv/modules/imgproc/src/imgwarp.cpp:2597: error: (-215:Assertion failed) _src.channels() <= 4 || (interpolation != INTER_LANCZOS4 && interpolation != INTER_CUBIC) in function 'warpAffine'
'''
I have tried a lot of cv versions but doesn't work. Can you give me some tips?
Checking the albumentations version, and using compatible opencv.