cudaGetLastError() == cudaSuccess INTERNAL ASSERT FAILED

Question

cudaGetLastError() == cudaSuccess INTERNAL ASSERT FAILED

luyvlei opened this issue 3 years ago · comments

When I use the script to train the model,the following error occurs

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/home/lll/anaconda3/envs/ICT_py37_torch14/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/home/lll/pycharm_project/IAST/code/main.py", line 71, in main_worker
    train_net(net=net, cfg=cfg, gpu=proc_idx)
  File "/home/lll/pycharm_project/IAST/code/sseg/workflow/trainer.py", line 280, in train_net
    intersection, union = intersectionAndUnionGPU(label_pred, labels, n_class)
  File "/home/lll/pycharm_project/IAST/code/sseg/datasets/metrics/miou.py", line 72, in intersectionAndUnionGPU
    area_intersection = torch.histc(intersection.float(), bins=K, min=0, max=K-1)
RuntimeError: cudaGetLastError() == cudaSuccess INTERNAL ASSERT FAILED at /tmp/pip-req-build-ufslq_a9/aten/src/ATen/native/cuda/SummaryOps.cu:253, please report a bug to PyTorch. kernelHistogram1D failed

I train it on one 1080ti with pytorch1.4 and my apex is installed by conda.The following warning is also issued at runtime:

Warning:  multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback.  Original ImportError was: ModuleNotFoundError("No module named 'amp_C'")
Warning:  apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.

Is this issue related to my APEX installation version? it seems that conda's apex is not complete, which is not include cuda_ext and cpp_ext.

Wenqi Tang · Answer 1 · Sat Jul 03 2021 01:15:23 GMT+0800 (China Standard Time)

@luyvlei For performance and full functionality, I suggest you follow this to install APEX.