huzixuan1 / Object_Dete_Masking

Object Detection About Masking

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

训练失败

lkp520 opened this issue · comments

/home/amax/anaconda3/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:129: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). "
/home/amax/kp_pytorch/yolov3/project/utils.py:608: MatplotlibDeprecationWarning: Passing non-integers as three-element position specification is deprecated since 3.3 and will be removed two minor releases later.
plt.subplot(ns, ns, i + 1).imshow(imgs[i].transpose(1, 2, 0))
/home/amax/kp_pytorch/yolov3/project/utils.py:608: MatplotlibDeprecationWarning: Passing non-integers as three-element position specification is deprecated since 3.3 and will be removed two minor releases later.
plt.subplot(ns, ns, i + 1).imshow(imgs[i].transpose(1, 2, 0))
/home/amax/kp_pytorch/yolov3/project/utils.py:608: MatplotlibDeprecationWarning: Passing non-integers as three-element position specification is deprecated since 3.3 and will be removed two minor releases later.
plt.subplot(ns, ns, i + 1).imshow(imgs[i].transpose(1, 2, 0))
/home/amax/kp_pytorch/yolov3/project/utils.py:608: MatplotlibDeprecationWarning: Passing non-integers as three-element position specification is deprecated since 3.3 and will be removed two minor releases later.
plt.subplot(ns, ns, i + 1).imshow(imgs[i].transpose(1, 2, 0))
/home/amax/kp_pytorch/yolov3/project/utils.py:608: MatplotlibDeprecationWarning: Passing non-integers as three-element position specification is deprecated since 3.3 and will be removed two minor releases later.
plt.subplot(ns, ns, i + 1).imshow(imgs[i].transpose(1, 2, 0))
/home/amax/kp_pytorch/yolov3/project/utils.py:608: MatplotlibDeprecationWarning: Passing non-integers as three-element position specification is deprecated since 3.3 and will be removed two minor releases later.
plt.subplot(ns, ns, i + 1).imshow(imgs[i].transpose(1, 2, 0))
/home/amax/kp_pytorch/yolov3/project/utils.py:608: MatplotlibDeprecationWarning: Passing non-integers as three-element position specification is deprecated since 3.3 and will be removed two minor releases later.
plt.subplot(ns, ns, i + 1).imshow(imgs[i].transpose(1, 2, 0))
/home/amax/kp_pytorch/yolov3/project/utils.py:608: MatplotlibDeprecationWarning: Passing non-integers as three-element position specification is deprecated since 3.3 and will be removed two minor releases later.
plt.subplot(ns, ns, i + 1).imshow(imgs[i].transpose(1, 2, 0))
/home/amax/anaconda3/lib/python3.8/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, kwargs) # type: ignore[attr-defined]
Traceback (most recent call last):
File "train.py", line 334, in
results = train(
File "train.py", line 237, in train
loss.backward()
File "/home/amax/anaconda3/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/amax/anaconda3/lib/python3.8/site-packages/torch/autograd/init.py", line 154, in backward
Variable._execution_engine.run_backward(
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [3,0,0] Assertion t >= 0 && t < n_classes failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [5,0,0] Assertion t >= 0 && t < n_classes failed.
terminate called after throwing an instance of 'c10::Error'
what(): NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:181, unhandled cuda error, NCCL version 21.0.3
Process Group destroyed on rank 0
Exception raised from ncclCommAbort at ../torch/csrc/distributed/c10d/NCCLUtils.hpp:181 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f9763f54d62 in /home/amax/anaconda3/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const
, char const
, unsigned int, std::string const&) + 0x5b (0x7f9763f5168b in /home/amax/anaconda3/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: + 0x30a6cae (0x7f97c122ccae in /home/amax/anaconda3/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #3: c10d::ProcessGroupNCCL::~ProcessGroupNCCL() + 0x113 (0x7f97c1215853 in /home/amax/anaconda3/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #4: c10d::ProcessGroupNCCL::~ProcessGroupNCCL() + 0x9 (0x7f97c1215a79 in /home/amax/anaconda3/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #5: + 0xe6e6b6 (0x7f982b4f26b6 in /home/amax/anaconda3/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0xe543c5 (0x7f982b4d83c5 in /home/amax/anaconda3/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #7: + 0x2a3790 (0x7f982a927790 in /home/amax/anaconda3/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #8: + 0x2a49fe (0x7f982a9289fe in /home/amax/anaconda3/lib/python3.8/site-packages/torch/lib/libtorch_python.so)

frame #18: __libc_start_main + 0xf3 (0x7f982e1000b3 in /lib/x86_64-linux-gnu/libc.so.6)

一大堆错误,怎么办?
现在有第二个错误:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/amax/anaconda3/envs/new_env/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/amax/anaconda3/envs/new_env/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/amax/anaconda3/envs/new_env/lib/python3.6/multiprocessing/resource_sharer.py", line 139, in _serve
signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
File "/home/amax/anaconda3/envs/new_env/lib/python3.6/signal.py", line 60, in pthread_sigmask
sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range
怎么办??