Training with cityscapes dataset and ohem loss

Question

Training with cityscapes dataset and ohem loss

kame-hameha opened this issue 4 years ago · comments

Hello, I am using your nice suite since a couple of days. Running on two RTX GPUs (24GB) and there is a problem with the ohem loss:
python3 train.py --dataset cityscapes --use_ohem --gpus 0,1 --batch_size 32 --num_worker 8

1.4 --> Pytorch version (same error for 1.1, though)

=====> input size:(512, 1024)
Namespace(batch_size=32, classes=19, cuda=True, dataset='cityscapes', gpus='0,1', input_size='512,1024', logFile='log.txt', lr=0.0005, lr_schedule='warmpoly', max_epochs=1000, model='ENet', num_cycles=1, num_wor
kers=8, optim='adam', poly_exp=0.9, random_mirror=True, random_scale=True, resume='', savedir='./checkpoint/', train_type='trainval', use_focal=False, use_label_smoothing=False, use_lovaszsoftmax=False, use_ohem
=True, warmup_factor=0.3333333333333333, warmup_iters=500)
=====> use gpu id: '0,1'
=====> set Global Seed:  1234
=====> building network
=====> computing network parameters and FLOPs
the number of parameters: 360422 ==> 0.36 M
find file:  ./dataset/inform/cityscapes_inform.pkl
length of dataset:  3475
length of dataset:  500
=====> Dataset statistics
data['classWeights']:  [ 1.4705521  9.505282  10.492059  10.492059  10.492059  10.492059
 10.492059  10.492059  10.492059  10.492059  10.492059  10.492059
 10.492059  10.492059  10.492059  10.492059  10.492059  10.492059
  5.131664 ]
mean and std:  [72.3924   82.90902  73.158325] [45.319206 46.15292  44.91484 ]
w/ class balance
torch.cuda.device_count()= 2
=====> beginning training
=====> the number of iterations per epoch:  108
Traceback (most recent call last):
  File "train.py", line 398, in <module>
    train_model(args)
  File "train.py", line 215, in train_model
    lossTr, lr = train(args, trainLoader, model, criteria, optimizer, epoch)
  File "train.py", line 327, in train
    loss = criterion(output, labels)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/git/Dummy-Efficient-Segmentation-Networks/utils/losses/loss.py", line 192, in forward
    prob = prob.masked_fill_(1 - valid_mask, 1)     #
  File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 394, in __rsub__
    return _C._VariableFunctions.rsub(self, other)
RuntimeError: Subtraction, the `-` operator, with

No error occurs if I use focal loss (FocalLoss2d)...