arthurdouillard / CVPR2021_PLOP

Official code of CVPR 2021's PLOP: Learning without Forgetting for Continual Semantic Segmentation

Home Page:https://arxiv.org/abs/2011.11390

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NaNs on PLOP 15-5 setting on single GPU

jonasB00 opened this issue · comments

Hi,
I think i found a bug in your code. As others pointed out allready: You can get NaN values in your Network if you train on PLOP step 1. You can fix this problem if you use --opt_level 01. However, I wanted to edit apex out of your Code and thats when the NaNs become a real problem. I dont exactly know why, but apex seems to just skip these NaN values and carry on with training.

The problem can be found when computing the classif_adaptive_factor in train.py. It is computed as
classif_adaptive_factor = num / den
On rare occasion (if the picture just consists of one class) the den value can become zero as it is computed from mask_background which is set to:
mask_background = labels < self.old_classes
den = mask_background.float().sum(dim=(1, 2))

The issue can easily be solved by just modifiyng the den value right before computing the classif_adaptive_factor for example via
for i in range(0, opts.batch_size):
if den[i] == 0:
den[i] = 1

The solution is practical but isnt that satisfying. Maybe you can look into this particular section again and fix the issue. And I'd be really interested if someone has an idea why apex skips these NaNs.

Kind regards,
Jonas

Dataset: voc
Setting: 15-5 disjoint
Script: (I didnt use torch.distributed)
python run.py --data_root data --batch_size 8 --dataset voc --name PLOP --task 15-5 --step 0 --lr 0.01 --epochs 30 --method PLOP
python run.py --data_root data --batch_size 8 --dataset voc --name PLOP --task 15-5 --step 1 --lr 0.001 --epochs 30 --method PLOP

Hey,

Have you tried the code with the epsilon added in the denominator? --> https://github.com/arthurdouillard/CVPR2021_PLOP/blob/main/train.py#L292

It should fix the problem

Hi,
no, I have not seen this version before. I will use it from now on. Thank you!