This implementation yields shape mismatch errors for Semantic Segmentation when Radix > 1

Question

This implementation yields shape mismatch errors for Semantic Segmentation when Radix > 1

Cyril9227 opened this issue 4 years ago · comments

Cyril Equilbec commented 4 years ago

Instructions To Reproduce the Issue:

what changes you made (git diff) or what code you wrote

I just moved the keys specific to ResNeSt under MODEL.RESNEST for better clarity.

what exact command you run:
Standard training using this config file :

_BASE_: "Base-SemSeg.yaml"

MODEL:
  WEIGHTS: "https://hangzh.s3-us-west-1.amazonaws.com/encoding/models/resnest200_detectron-02644020.pth"
  BACKBONE:
    NAME: "build_resnest_fpn_backbone"
  RESNEST:
    RADIX: 2
  RESNETS:
    DEPTH : 50
    STRIDE_IN_1X1: False
    NORM: "SyncBN" 
  FPN:
    NORM: "SyncBN"
  ROI_HEADS:
    NAME: CascadeROIHeads
  ROI_BOX_HEAD:
    NAME: "FastRCNNConvFCHead"
    NUM_CONV: 4
    NUM_FC: 1
    NORM: "SyncBN"
    CLS_AGNOSTIC_BBOX_REG: True
  SEM_SEG_HEAD:
    NORM: "SyncBN"
    NUM_CLASSES: 3
  RPN:
    POST_NMS_TOPK_TRAIN: 2000
  PIXEL_MEAN: [123.68, 116.779, 103.939]
  PIXEL_STD: [58.393, 57.12, 57.375]
SOLVER:
  IMS_PER_BATCH: 16
  BASE_LR: 0.02
  STEPS: (240000, 255000)
  MAX_ITER: 270000
TEST:
  PRECISE_BN:
    ENABLED: True
  AUG:
    ENABLED: True

Base-SemSeg.yaml contains the data registration and the "SemanticSegmentor" related entries.

what you observed (including full logs):

Traceback (most recent call last):
  File "/home/jupyter-cyril/venv_detectron2/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 132, in train
    self.run_step()
  File "/home/jupyter-cyril/venv_detectron2/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 215, in run_step
    loss_dict = self.model(data)
  File "/home/jupyter-cyril/venv_detectron2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jupyter-cyril/venv_detectron2/lib/python3.6/site-packages/detectron2/modeling/meta_arch/semantic_seg.py", line 81, in forward
    results, losses = self.sem_seg_head(features, targets)
  File "/home/jupyter-cyril/venv_detectron2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jupyter-cyril/venv_detectron2/lib/python3.6/site-packages/detectron2/modeling/meta_arch/semantic_seg.py", line 167, in forward
    F.cross_entropy(x, targets, reduction="mean", ignore_index=self.ignore_value)
  File "/home/jupyter-cyril/venv_detectron2/lib/python3.6/site-packages/torch/nn/functional.py", line 2021, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/home/jupyter-cyril/venv_detectron2/lib/python3.6/site-packages/torch/nn/functional.py", line 1840, in nll_loss
    ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: input and target batch or spatial sizes don't match: target [16 x 544 x 800], input [16 x 3 x 800 x 544] at /pytorch/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:23

Environment:

Provide your environment information using the following command:

sys.platform              linux
Python                    3.6.7 | packaged by conda-forge | (default, Nov 21 2018, 02:32:25) [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)]
numpy                     1.18.3
detectron2                0.1.1 @/home/jupyter-cyril/venv_detectron2/lib/python3.6/site-packages/detectron2
detectron2 compiler       GCC 7.3
detectron2 CUDA compiler  10.0
detectron2 arch flags     sm_35, sm_37, sm_50, sm_52, sm_60, sm_61, sm_70, sm_75
DETECTRON2_ENV_MODULE     <not set>
PyTorch                   1.4.0+cu100 @/home/jupyter-cyril/venv_detectron2/lib/python3.6/site-packages/torch
PyTorch debug build       False
CUDA available            True
GPU 0                     Tesla T4
CUDA_HOME                 /usr/local/cuda
NVCC                      Cuda compilation tools, release 10.0, V10.0.130
Pillow                    6.2.1
torchvision               0.5.0+cu100 @/home/jupyter-cyril/venv_detectron2/lib/python3.6/site-packages/torchvision
torchvision arch flags    sm_35, sm_50, sm_60, sm_70, sm_75
cv2                       4.2.0
------------------------  ----------------------------------------------------------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CUDA Runtime 10.0
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.3
  - Magma 2.5.1
  - Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

Cyril Equilbec · Answer 1 · Tue Jun 09 2020 15:36:58 GMT+0800 (China Standard Time)

Ok the error seems to be because of Random Rotate data augmentation