fcjian / TOOD

TOOD: Task-aligned One-stage Object Detection, ICCV2021 Oral

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error during training (Assertion input_val >= zero && input_val <= one failed.)

MeteoriteWeny opened this issue · comments

Problem

thank you for contribution, I encountered gradient exploding during training the model tood_r50_fpn_1x_coco.

  • I tried to train this model in Mix-Precision Training strategy, and the loss scale was set 'dynamic'. The training soon stopped, and raise RuntimeError: CUDA error: device-side assert triggered.

  • I also retrained the model with FP32 precision, but it did not work.

  • A lower lr did not address gradient exploding.

  • Gradient cutting helps avoid training failure (Mix-Precision Training, loss scale=512.) , but the model can not converge.

    I try to google this issue. I think it is not OOM. It seems to relate with the NaN value in prediction head and further cause the error at calculating loss. I do not know if the environment(mmdet-1.15.0) affects with training.

My modification

  • I port the TOOD code to my working environment (MMDet-1.15.0), without edit.
  • I edit the training config to train my own dataset.

Environment

2021-12-09 16:50:01,643 - mmdet - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0]
CUDA available: True
GPU 0: NVIDIA GeForce RTX 2070
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.4.r11.4/compiler.30033411_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.9.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.3-Product Build 20210617 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.10.0
OpenCV: 4.5.3
MMCV: 1.3.10
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.1
MMDetection: 2.15.0+87eda06
------------------------------------------------------------

Error Report

/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [32,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [33,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [34,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [35,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [36,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [37,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [38,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [39,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [40,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [41,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [42,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [43,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [44,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [45,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [46,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [47,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [48,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [49,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [50,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [51,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [52,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [53,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [54,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [55,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [56,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [57,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [58,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [59,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [60,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [61,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [62,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [19,0,0], thread: [63,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [32,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [33,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [34,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [35,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [36,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [37,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [38,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [39,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [40,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [41,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [42,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [43,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [44,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [45,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [46,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [47,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [48,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [49,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [50,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [51,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [52,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [53,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [54,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [55,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [56,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [57,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [58,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [59,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [60,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [61,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [62,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [63,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [0,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [1,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [2,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [3,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [4,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [5,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [6,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [7,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [8,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [9,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [10,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [11,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [12,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [13,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [14,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [15,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [16,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [17,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [18,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [19,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [20,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [21,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [22,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [23,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [24,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [25,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [26,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [27,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [28,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [29,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [30,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [33,0,0], thread: [31,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [0,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [1,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [2,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [3,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [4,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [5,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [6,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [7,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [8,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [9,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [10,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [11,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [12,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [13,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [14,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [15,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [16,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [17,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [18,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [19,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [20,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [21,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [22,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [23,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [24,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [25,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [26,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [27,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [28,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [29,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [30,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [103,0,0], thread: [31,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [0,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [1,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [2,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [3,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [4,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [5,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [6,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [7,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [8,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [9,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [10,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [11,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [12,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [13,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [14,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [15,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [16,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [17,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [18,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [19,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [20,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [21,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [22,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [23,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [24,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [25,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [26,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [27,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [28,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [29,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [30,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/Loss.cu:111: operator(): block: [31,0,0], thread: [31,0,0] Assertion `input_val >= zero && input_val <= one` failed.
Traceback (most recent call last):
  File "tools/train.py", line 188, in <module>
    main()
  File "tools/train.py", line 184, in main
    meta=meta)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/apis/train.py", line 170, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
    self.run_iter(data_batch, train_mode=True, **kwargs)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
    **kwargs)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/models/detectors/base.py", line 237, in train_step
    losses = self(**data)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 97, in new_func
    return old_func(*args, **kwargs)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/models/detectors/base.py", line 171, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/models/detectors/single_stage.py", line 83, in forward_train
    gt_labels, gt_bboxes_ignore)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/models/dense_heads/base_dense_head.py", line 54, in forward_train
    losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 185, in new_func
    return old_func(*args, **kwargs)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/models/dense_heads/tood_head.py", line 426, in loss
    num_total_samples=num_total_samples)
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/core/utils/misc.py", line 29, in multi_apply
    return tuple(map(list, zip(*map_results)))
  File "/root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmdet-2.15.0-py3.7.egg/mmdet/models/dense_heads/tood_head.py", line 333, in loss_single
    & (labels < bg_class_ind)).nonzero().squeeze(1)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
terminate called after throwing an instance of 'c10::CUDAError'
  what():  CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1623448265233/work/c10/cuda/CUDACachingAllocator.cpp:1055 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f12c21efa22 in /root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x10ac3 (0x7f12c2451ac3 in /root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1a7 (0x7f12c2453167 in /root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7f12c21d95a4 in /root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #4: <unknown function> + 0xa2bb12 (0x7f133bad0b12 in /root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0xa2bbb1 (0x7f133bad0bb1 in /root/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #24: __libc_start_main + 0xe7 (0x7f1376d75bf7 in /lib/x86_64-linux-gnu/libc.so.6)

Aborted

Same issue

Same issue. My mmdet's version is 2.19.0 and raise error during training the 3rd epoch

You can try to clamp the value of the box area when computing GIoU loss, e.g.,

area1 = fp16_clamp((bboxes1[..., 2] - bboxes1[..., 0]), min=0) * fp16_clamp((
bboxes1[..., 3] - bboxes1[..., 1]), min=0)
area2 = fp16_clamp((bboxes2[..., 2] - bboxes2[..., 0]), min=0) * fp16_clamp((
bboxes2[..., 3] - bboxes2[..., 1]), min=0)

You can try to clamp the value of the box area when computing GIoU loss, e.g.,

area1 = fp16_clamp((bboxes1[..., 2] - bboxes1[..., 0]), min=0) * fp16_clamp((
bboxes1[..., 3] - bboxes1[..., 1]), min=0)
area2 = fp16_clamp((bboxes2[..., 2] - bboxes2[..., 0]), min=0) * fp16_clamp((
bboxes2[..., 3] - bboxes2[..., 1]), min=0)

hello sir,i have clamp the value of box area as you show ,but still crash at the 5rd epoch. My mmdet's version is 2.14.0+d3e713d.

Error Report:

/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [508,0,0], thread: [26,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [508,0,0], thread: [27,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [508,0,0], thread: [28,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [508,0,0], thread: [29,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [508,0,0], thread: [30,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [508,0,0], thread: [31,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [32,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [33,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [34,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [35,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [36,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [37,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [38,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [39,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [40,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [41,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [42,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [43,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [44,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [45,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [46,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [47,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [48,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [49,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [50,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [51,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [52,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [53,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [54,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [55,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [56,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [57,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [58,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [59,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [60,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [61,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [62,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [63,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [0,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [1,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [2,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [3,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [4,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [5,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [6,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [7,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [8,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [9,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [10,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [11,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [12,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [13,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [14,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [15,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [16,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [17,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [18,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [19,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [20,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [21,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [22,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [23,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [24,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [25,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [26,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [27,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [28,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [29,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [30,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [31,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [32,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [33,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [34,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [35,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [36,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [37,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [38,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [39,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [40,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [41,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [42,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [43,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [44,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [45,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [46,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [47,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [48,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [49,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [50,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [51,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [52,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [53,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [54,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [55,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [56,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [57,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [58,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [59,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [60,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [61,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [62,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [63,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [0,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [1,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [2,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [3,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [4,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [5,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [6,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [7,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [8,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [9,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [10,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [11,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [12,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [13,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [14,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [15,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [16,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [17,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [18,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [19,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [20,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [21,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [22,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [23,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [24,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [25,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [26,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [27,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [28,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [29,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [30,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [31,0,0] Assertion input_val >= zero && input_val <= one failed.
Traceback (most recent call last):
File "./tools/train.py", line 188, in
main()
File "./tools/train.py", line 184, in main
meta=meta)
File "/mnt/mhm/project/TODO/TOOD/mmdet/apis/train.py", line 170, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 52, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/mnt/mhm/project/TODO/TOOD/mmdet/models/detectors/base.py", line 237, in train_step
losses = self(**data)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 98, in new_func
return old_func(*args, **kwargs)
File "/mnt/mhm/project/TODO/TOOD/mmdet/models/detectors/base.py", line 171, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/mnt/mhm/project/TODO/TOOD/mmdet/models/detectors/single_stage.py", line 83, in forward_train
gt_labels, gt_bboxes_ignore)
File "/mnt/mhm/project/TODO/TOOD/mmdet/models/dense_heads/base_dense_head.py", line 54, in forward_train
losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 186, in new_func
return old_func(*args, kwargs)
File "/mnt/mhm/project/TODO/TOOD/mmdet/models/dense_heads/tood_head.py", line 447, in loss
num_total_samples=num_total_samples)
File "/mnt/mhm/project/TODO/TOOD/mmdet/core/utils/misc.py", line 29, in multi_apply
return tuple(map(list, zip(map_results)))
File "/mnt/mhm/project/TODO/TOOD/mmdet/models/dense_heads/tood_head.py", line 354, in loss_single
& (labels < bg_class_ind)).nonzero().squeeze(1)
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1614378098133/work/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fdf062a32f2 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const
, char const
, unsigned int, std::string const&) + 0x5b (0x7fdf062a067b in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void
) + 0x809 (0x7fdf064fc219 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7fdf0628b3a4 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #4: + 0x6e6a3a (0x7fdf5d204a3a in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0x6e6ae1 (0x7fdf5d204ae1 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0x1817da (0x55b94fc797da in /root/anaconda3/envs/open-mmlab/bin/python)
frame #7: + 0xfbfa9 (0x55b94fbf3fa9 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #8: + 0xfa8c8 (0x55b94fbf28c8 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #9: + 0xfa8c8 (0x55b94fbf28c8 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #10: + 0xfa2d8 (0x55b94fbf22d8 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #11: + 0xfad68 (0x55b94fbf2d68 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #12: + 0xfad7c (0x55b94fbf2d7c in /root/anaconda3/envs/open-mmlab/bin/python)
frame #13: + 0xfad7c (0x55b94fbf2d7c in /root/anaconda3/envs/open-mmlab/bin/python)
frame #14: + 0xfad7c (0x55b94fbf2d7c in /root/anaconda3/envs/open-mmlab/bin/python)
frame #15: + 0xfad7c (0x55b94fbf2d7c in /root/anaconda3/envs/open-mmlab/bin/python)
frame #16: + 0xfad7c (0x55b94fbf2d7c in /root/anaconda3/envs/open-mmlab/bin/python)
frame #17: + 0xfad7c (0x55b94fbf2d7c in /root/anaconda3/envs/open-mmlab/bin/python)
frame #18: + 0x12b327 (0x55b94fc23327 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #19: PyDict_SetItemString + 0x89 (0x55b94fc2fe59 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #20: PyImport_Cleanup + 0xab (0x55b94fca4d0b in /root/anaconda3/envs/open-mmlab/bin/python)
frame #21: Py_FinalizeEx + 0x64 (0x55b94fd19304 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #22: + 0x232960 (0x55b94fd2a960 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #23: _Py_UnixMain + 0x3c (0x55b94fd2accc in /root/anaconda3/envs/open-mmlab/bin/python)
frame #24: __libc_start_main + 0xf0 (0x7fdf9851e830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #25: + 0x1d7555 (0x55b94fccf555 in /root/anaconda3/envs/open-mmlab/bin/python)

Killing subprocess 19911
Traceback (most recent call last):
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 340, in
main()
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/anaconda3/envs/open-mmlab/bin/python', '-u', './tools/train.py', '--local_rank=0',

Thank you for your reply.

@fcjian Thanks for reply! It solves the CUDA error, but the model can not converge. During training, a problem similar with gradient cutting happened. The log shows a sudden increase of loss. After that, the loss fluctuates in a tiny range. I'll try again with the original TOOD code without transfering to higher mmdet version.

2021-12-29 09:32:52,217 - mmdet - INFO - Epoch [1][600/1162]	lr: 2.000e-03, eta: 8:39:18, time: 0.544, data_time: 0.013, memory: 5142, loss_cls: 0.6940, loss_bbox: 1.2061, loss: 1.9001
2021-12-29 09:33:18,832 - mmdet - INFO - Epoch [1][650/1162]	lr: 2.000e-03, eta: 8:38:09, time: 0.532, data_time: 0.013, memory: 5142, loss_cls: 0.6794, loss_bbox: 1.1886, loss: 1.8680
2021-12-29 09:33:45,535 - mmdet - INFO - Epoch [1][700/1162]	lr: 2.000e-03, eta: 8:37:13, time: 0.534, data_time: 0.013, memory: 5142, loss_cls: 0.6674, loss_bbox: 1.0485, loss: 1.7159
2021-12-29 09:34:12,217 - mmdet - INFO - Epoch [1][750/1162]	lr: 2.000e-03, eta: 8:36:19, time: 0.534, data_time: 0.013, memory: 5142, loss_cls: 0.6646, loss_bbox: 1.0119, loss: 1.6765
2021-12-29 09:34:38,781 - mmdet - INFO - Epoch [1][800/1162]	lr: 2.000e-03, eta: 8:35:20, time: 0.531, data_time: 0.013, memory: 5142, loss_cls: 0.6487, loss_bbox: 0.9564, loss: 1.6051
2021-12-29 09:35:05,190 - mmdet - INFO - Epoch [1][850/1162]	lr: 2.000e-03, eta: 8:34:14, time: 0.528, data_time: 0.013, memory: 5142, loss_cls: 0.6176, loss_bbox: 0.8406, loss: 1.4582
2021-12-29 09:35:31,799 - mmdet - INFO - Epoch [1][900/1162]	lr: 2.000e-03, eta: 8:33:26, time: 0.532, data_time: 0.013, memory: 5142, loss_cls: 0.6210, loss_bbox: 0.9229, loss: 1.5439
2021-12-29 09:35:58,144 - mmdet - INFO - Epoch [1][950/1162]	lr: 2.000e-03, eta: 8:32:24, time: 0.527, data_time: 0.013, memory: 5142, loss_cls: 1.1693, loss_bbox: 1.1850, loss: 2.3543
2021-12-29 09:36:25,339 - mmdet - INFO - Exp name: tood_r50_fpn_on_input_1x_coco_cloth.py
2021-12-29 09:36:25,340 - mmdet - INFO - Epoch [1][1000/1162]	lr: 2.000e-03, eta: 8:32:14, time: 0.544, data_time: 0.013, memory: 5142, loss_cls: 1.2817, loss_bbox: 1.3174, loss: 2.5991
2021-12-29 09:36:52,114 - mmdet - INFO - Epoch [1][1050/1162]	lr: 2.000e-03, eta: 8:31:39, time: 0.535, data_time: 0.013, memory: 5142, loss_cls: 1.2358, loss_bbox: 1.2847, loss: 2.5205
2021-12-29 09:37:18,908 - mmdet - INFO - Epoch [1][1100/1162]	lr: 2.000e-03, eta: 8:31:07, time: 0.536, data_time: 0.013, memory: 5142, loss_cls: 1.2365, loss_bbox: 1.3173, loss: 2.5538
2021-12-29 09:37:45,867 - mmdet - INFO - Epoch [1][1150/1162]	lr: 2.000e-03, eta: 8:30:43, time: 0.539, data_time: 0.013, memory: 5142, loss_cls: 1.2022, loss_bbox: 1.2296, loss: 2.4319
2021-12-29 09:37:52,329 - mmdet - INFO - Saving checkpoint at 1 epochs
2021-12-29 09:38:47,804 - mmdet - INFO - Evaluating bbox...
2021-12-29 09:38:51,494 - mmdet - INFO - Exp name: tood_r50_fpn_on_input_1x_coco_cloth.py
2021-12-29 09:38:51,495 - mmdet - INFO - Epoch(val) [1][793]	bbox_mAP: 0.0170, bbox_mAP_50: 0.0560, bbox_mAP_75: 0.0090, bbox_mAP_s: -1.0000, bbox_mAP_m: 0.0240, bbox_mAP_l: 0.0190, bbox_mAP_copypaste: 0.017 0.056 0.009 -1.000 0.024 0.019
2021-12-29 09:39:21,128 - mmdet - INFO - Epoch [2][50/1162]	lr: 2.000e-03, eta: 8:27:14, time: 0.592, data_time: 0.062, memory: 5142, loss_cls: 1.2236, loss_bbox: 1.2423, loss: 2.4659
2021-12-29 09:39:47,839 - mmdet - INFO - Epoch [2][100/1162]	lr: 2.000e-03, eta: 8:26:45, time: 0.534, data_time: 0.013, memory: 5142, loss_cls: 1.2410, loss_bbox: 1.2517, loss: 2.4927
2021-12-29 09:40:14,530 - mmdet - INFO - Epoch [2][150/1162]	lr: 2.000e-03, eta: 8:26:16, time: 0.534, data_time: 0.013, memory: 5142, loss_cls: 1.2827, loss_bbox: 1.2900, loss: 2.5726
2021-12-29 09:40:41,392 - mmdet - INFO - Epoch [2][200/1162]	lr: 2.000e-03, eta: 8:25:54, time: 0.537, data_time: 0.013, memory: 5142, loss_cls: 1.2351, loss_bbox: 1.2374, loss: 2.4725
2021-12-29 09:41:08,168 - mmdet - INFO - Epoch [2][250/1162]	lr: 2.000e-03, eta: 8:25:28, time: 0.536, data_time: 0.013, memory: 5142, loss_cls: 1.1736, loss_bbox: 1.1955, loss: 2.3691
2021-12-29 09:41:34,806 - mmdet - INFO - Epoch [2][300/1162]	lr: 2.000e-03, eta: 8:24:57, time: 0.533, data_time: 0.013, memory: 5142, loss_cls: 1.2357, loss_bbox: 1.2372, loss: 2.4729
2021-12-29 09:42:01,528 - mmdet - INFO - Epoch [2][350/1162]	lr: 2.000e-03, eta: 8:24:29, time: 0.534, data_time: 0.013, memory: 5142, loss_cls: 1.2839, loss_bbox: 1.2587, loss: 2.5425
2021-12-29 09:42:28,154 - mmdet - INFO - Epoch [2][400/1162]	lr: 2.000e-03, eta: 8:23:58, time: 0.533, data_time: 0.013, memory: 5142, loss_cls: 1.2595, loss_bbox: 1.2359, loss: 2.4954
2021-12-29 09:42:54,986 - mmdet - INFO - Epoch [2][450/1162]	lr: 2.000e-03, eta: 8:23:35, time: 0.537, data_time: 0.013, memory: 5142, loss_cls: 1.2725, loss_bbox: 1.3049, loss: 2.5773
2021-12-29 09:43:21,637 - mmdet - INFO - Epoch [2][500/1162]	lr: 2.000e-03, eta: 8:23:05, time: 0.533, data_time: 0.013, memory: 5142, loss_cls: 1.2867, loss_bbox: 1.2862, loss: 2.5730
2021-12-29 09:43:48,377 - mmdet - INFO - Epoch [2][550/1162]	lr: 2.000e-03, eta: 8:22:38, time: 0.535, data_time: 0.013, memory: 5142, loss_cls: 1.2554, loss_bbox: 1.2227, loss: 2.4781
2021-12-29 09:44:15,013 - mmdet - INFO - Epoch [2][600/1162]	lr: 2.000e-03, eta: 8:22:08, time: 0.533, data_time: 0.013, memory: 5142, loss_cls: 1.2519, loss_bbox: 1.2955, loss: 2.5474
2021-12-29 09:44:42,014 - mmdet - INFO - Epoch [2][650/1162]	lr: 2.000e-03, eta: 8:21:49, time: 0.540, data_time: 0.013, memory: 5142, loss_cls: 1.2472, loss_bbox: 1.2727, loss: 2.5199
2021-12-29 09:45:08,675 - mmdet - INFO - Epoch [2][700/1162]	lr: 2.000e-03, eta: 8:21:20, time: 0.533, data_time: 0.013, memory: 5142, loss_cls: 1.1740, loss_bbox: 1.2461, loss: 2.4200
2021-12-29 09:45:35,666 - mmdet - INFO - Epoch [2][750/1162]	lr: 2.000e-03, eta: 8:21:00, time: 0.540, data_time: 0.013, memory: 5142, loss_cls: 1.2391, loss_bbox: 1.2960, loss: 2.5351
2021-12-29 09:46:02,395 - mmdet - INFO - Epoch [2][800/1162]	lr: 2.000e-03, eta: 8:20:33, time: 0.535, data_time: 0.013, memory: 5142, loss_cls: 1.2462, loss_bbox: 1.2470, loss: 2.4933
2021-12-29 09:46:29,543 - mmdet - INFO - Epoch [2][850/1162]	lr: 2.000e-03, eta: 8:20:17, time: 0.543, data_time: 0.013, memory: 5142, loss_cls: 1.2525, loss_bbox: 1.3128, loss: 2.5653
2021-12-29 09:46:56,271 - mmdet - INFO - Epoch [2][900/1162]	lr: 2.000e-03, eta: 8:19:50, time: 0.535, data_time: 0.013, memory: 5142, loss_cls: 1.2501, loss_bbox: 1.2733, loss: 2.5234
2021-12-29 09:47:22,898 - mmdet - INFO - Epoch [2][950/1162]	lr: 2.000e-03, eta: 8:19:19, time: 0.533, data_time: 0.013, memory: 5142, loss_cls: 1.3215, loss_bbox: 1.2575, loss: 2.5790

i meet the same issue , my code is
"area1 = fp16_clamp((bboxes1[..., 2] - bboxes1[..., 0]), min=0) * fp16_clamp((
bboxes1[..., 3] - bboxes1[..., 1]), min=0) "
since i clone the code, so i don't have to modify it.
but the bug still happens. and it happens randomly each time when i train it.