cudaGetLastError() == cudaSuccess INTERNAL ASSERT FAILED
luyvlei opened this issue · comments
When I use the script to train the model,the following error occurs
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/lll/anaconda3/envs/ICT_py37_torch14/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/lll/pycharm_project/IAST/code/main.py", line 71, in main_worker
train_net(net=net, cfg=cfg, gpu=proc_idx)
File "/home/lll/pycharm_project/IAST/code/sseg/workflow/trainer.py", line 280, in train_net
intersection, union = intersectionAndUnionGPU(label_pred, labels, n_class)
File "/home/lll/pycharm_project/IAST/code/sseg/datasets/metrics/miou.py", line 72, in intersectionAndUnionGPU
area_intersection = torch.histc(intersection.float(), bins=K, min=0, max=K-1)
RuntimeError: cudaGetLastError() == cudaSuccess INTERNAL ASSERT FAILED at /tmp/pip-req-build-ufslq_a9/aten/src/ATen/native/cuda/SummaryOps.cu:253, please report a bug to PyTorch. kernelHistogram1D failed
I train it on one 1080ti with pytorch1.4 and my apex is installed by conda.The following warning is also issued at runtime:
Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'")
Warning: apex was installed without --cpp_ext. Falling back to Python flatten and unflatten.
Is this issue related to my APEX installation version? it seems that conda's apex is not complete, which is not include cuda_ext and cpp_ext.