fcdl94 / MiB

Official code for Modeling the Background for Incremental Learning in Semantic Segmentation https://arxiv.org/abs/2002.00718

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Questions about the VOC 15-1 Disjoint and Overlapped results with MiB?

wuyujack opened this issue · comments

Hi @fcdl94, sorry for bothering you again, and since the questions are related to VOC, I open a new issue instead. I have re-run the disjoint and overlapped experiments for all the methods you included in the paper's Table 1. The ranking of the results is the same, which means the conclusions and claims of the paper works fine. However, when I list the MiB results in the sheet, something weird happen.

First, let me include my command line for replication as follow, and both settings are all training from scratch, which means they should be using the different pretrained model for the step 1-5:

  • overlapped:

CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2001 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_overlapped --task 15-5s --overlap --step 0 --lr 0.01 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2001 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_overlapped --task 15-5s --overlap --step 1 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2001 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_overlapped --task 15-5s --overlap --step 2 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2001 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_overlapped --task 15-5s --overlap --step 3 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2001 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_overlapped --task 15-5s --overlap --step 4 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2001 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_overlapped --task 15-5s --overlap --step 5 --lr 0.001 --epochs 30 --method MiB

  • disjoint:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913 --task 15-5s --step 0 --lr 0.01 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913 --task 15-5s --step 1 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913 --task 15-5s --step 2 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913 --task 15-5s --step 3 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913 --task 15-5s --step 4 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913 --task 15-5s --step 5 --lr 0.001 --epochs 30 --method MiB

I get the results after all the steps are completed, as shown below:

  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 all except 0
MiB (Disjoint) 84.61 16.49 31.68 58.00 27.89 43.96 1.02 15.31 73.63 1.05 35.17 22.15 65.75 54.63 28.12 80.94 0.09 24.38 13.66 15.79 17.38 31.35
MiB (Overlapped) 84.73 15.17 27.21 45.64 21.93 42.31 3.06 46.96 77.94 4.06 36.41 37.46 64.53 43.26 28.14 80.49 0.02 25.04 17.06 14.06 15.09 32.29

Results Summary:

  • For the Disjoint setting, the 1-15 class mIoU is 37.05 and 16-20 class mIoU is 14.26. The all mIoU should be 31.35;

  • For the Overlapped setting, the 1-15 class mIoU is 38.30 and 16-20 class mIoU is 14.25. The all mIoU should be 32.29.

Compared with the Table 2 results, for the disjoint setting, you get 46.2 mIoU for 1-15 and 12.9 mIoU for 16-20, then 37.9 mIoU for all; for the Overlapped setting, you get 35.1 mIoU for 1-15 and 13.5 for 16-20, then 29.7 mIoU for all.

Therefore,

  • The first problem is that the most different one should be in the Disjoint setting. For the 1-15 class, mine is 37.5 while yours is 46.2;
  • The second problem is that in my results, the Overlapped setting achieves better performance than the Disjoint setting, which is different from Table 2 results.

Is there anything I miss to reproduce the VOC 15-1 results with MiB?

FYI, I attach the raw data of each steps for each setting:

  • Disjoint:

Step 0:

T-IoU 0.9428957387543162 0.9093635384910007 0.407387220528084 0.9114261479178447 0.6930536991709606 0.8098990488060078 0.951383562635301 0.9017522187102619 0.9226235898934364 0.460997208347546 0.8647273405405083 0.5378474228922152 0.8973456596703172 0.867327442844769 0.8703716247790984 0.8676089249667962

Step 1:

T-IoU 0.9167589144636081 0.8873648605178784 0.4092252459852222 0.869479481025851 0.6806551633049537 0.7638765341517527 0.9205706473134169 0.8486735934421107 0.915373926952166 0.43939618475723113 0.8347598489209404 0.503665661560724 0.8803754274877195 0.8143450733616353 0.8425866785715388 0.8478188317980148 0.17931164855409873

Step 2:

T-IoU 0.9124010280617076 0.7157312195506308 0.3758244658997688 0.8048439599053759 0.5128759972974389 0.7772130550282125 0.8140185464753737 0.7540073558194968 0.8067052172080738 0.22098235839790872 0.24187419224161838 0.41607016629238047 0.7928031218145428 0.7074281456336383 0.8009249524418639 0.8291462394258059 0.08864073567935006 0.234592279843589

Step 3:

T-IoU 0.8605716951061348 0.7427626686495384 0.3754501232161909 0.809016676048648 0.5147889505056727 0.6270363019643613 0.7184538662827296 0.7334033091838555 0.8304927366076473 0.1250355660764828 0.41305395553844215 0.44932921473209253 0.787892298275096 0.7264193954508246 0.7572718800408372 0.8392311512765394 0.04363873022939195 0.15701272132864186 0.15189688631990797

Step 4:

T-IoU 0.8617972763749628 0.20872328663156442 0.3631064199006128 0.5886765010309403 0.34041861509825144 0.5474293754913939 0.13570653927230805 0.5574966269754023 0.64431626024196 0.031194197377586146 0.4463714388961355 0.24921212066929457 0.6278902980821934 0.5785498297789483 0.504660152102315 0.7900153928512044 0.011786844749089675 0.20088916659028505 0.12481700285889741 0.19506359566450185

Step 5:

T-IoU 0.8460856346827669 0.16490433429984022 0.31678143548785276 0.5799742617085539 0.2789076000769832 0.4395905332124107 0.01023589863212125 0.1530806668255949 0.7362783346863634 0.010541665752975061 0.35172514843737246 0.2214938353272161 0.6574922122682622 0.5462862022895337 0.28115039555445903 0.8093942994334622 0.0009336352682833694 0.24378756157086445 0.1365743221933673 0.15787678203324385 0.17377045020350113
  • Overlapped:

Step 0:

T-IoU 0.942351454969199 0.9088923380558995 0.41445877002346937 0.8943720292308379 0.7216543435352611 0.8245976268597867 0.9404349006553606 0.9077450313931694 0.9266660651840857 0.4700834557332964 0.857335799102383 0.5793174660444275 0.8987260795649801 0.8612262856604168 0.8698300613490206 0.8655979116146679

Step 1:

T-IoU 0.9182269199439186 0.8792879987006258 0.4017180006248381 0.8441055742619886 0.6994568677193088 0.7797706674159932 0.9149230512624791 0.8669237437597723 0.925295289580988 0.4621438312184923 0.8278570724749236 0.5046357276965907 0.8827558057428029 0.8319480561554519 0.8351552396152283 0.8531293756555369 0.20081410045276016

Step 2:

T-IoU 0.9102123086486242 0.6716115648361257 0.3323608949874176 0.7409822233592194 0.4959355955312635 0.7102449043004382 0.8138243088151255 0.7864668388252009 0.8396605082671436 0.19756995217299297 0.47104696504611043 0.3864640364884321 0.7830237173154856 0.6534901834317308 0.8130206998712594 0.8267346598625166 0.2135801990714857 0.2739649761053714

Step 3:

T-IoU 0.8632306048394159 0.705974539166515 0.36605442930053417 0.7172314140174203 0.47740362529902175 0.6360963911559759 0.7891308674752905 0.8167691950759172 0.8518501315621179 0.1859484531773666 0.5689216234786797 0.4579828893103606 0.7713658863930264 0.6804223882563137 0.7407504318523732 0.8474278492117775 0.14708674311949124 0.2741278598946114 0.18034994422937553

Step 4:

T-IoU 0.8647615150869634 0.12248332643577793 0.29494393825587 0.5665025942424852 0.25189086565151464 0.504964955569028 0.2607169020464405 0.6373994602886512 0.6749896193770941 0.038352339739995286 0.3740172433656488 0.33634831372874185 0.584688752624233 0.40671331551812523 0.32672648567942325 0.7970850159150706 0.009704602104938941 0.20900667311557677 0.14287384004333556 0.18019787539515425

Step 5:

T-IoU 0.847341876904206 0.1516912277406037 0.27214761938518534 0.4564398366880403 0.21925902115097656 0.4230704428405192 0.03062049153551857 0.4696080316553439 0.7793639447531278 0.04059292675086336 0.3641082518756912 0.3746037758239177 0.6453468608880033 0.43260491849114996 0.28142230199939966 0.8048656345565484 0.0001993869309182823 0.2504386187169609 0.17055907785579358 0.14061286493441055 0.1509222886875183

Hi @wuyujack.
This is weird, I never noticed this in my experiments.
Have you changed something in the code? Did you use the provided splits?
One thing that may change from my experiments is that I directly used the pretrained of the 15-5 task for the 15-1 task. The data should be the same, but maybe there is something wrong with the splits. Can you check what happens when using the 15-5 pretrained?

@fcdl94 Thank you for the prompt reply! The code and the provided splits are the same and I do not change them. I may get back to you when I achieve 15-1 results using the Step 0 pretrained model from 15-5.

Hi @fcdl94, I have used the pretrained model of the 15-5 task for the 15-1 task but my results are almost the same as my previous comments. The command line is shown here:

  • Disjoint:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_0913_with_Step0_pretrained_from_15-5 --step_ckpt checkpoint/step/15-5-voc_test_MIB_voc_15_5_lr_0.01_with_pretrained_0913_0.pth --task 15-5s --step 1 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_0913_with_Step0_pretrained_from_15-5 --task 15-5s --step 2 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_0913_with_Step0_pretrained_from_15-5 --task 15-5s --step 3 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_0913_with_Step0_pretrained_from_15-5 --task 15-5s --step 4 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_0913_with_Step0_pretrained_from_15-5 --task 15-5s --step 5 --lr 0.001 --epochs 30 --method MiB

  • Overlapped:

CUDA_VISIBLE_DEVICES=2,3 python -m torch.distributed.launch --master_port 1996 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_overlapped_with_Step0_pretrained_from_15-5 --step_ckpt checkpoint/step/15-5-voc_test_MIB_voc_15_5_lr_0.01_with_pretrained_0913_overlapped_0.pth --task 15-5s --overlap --step 1 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=2,3 python -m torch.distributed.launch --master_port 1996 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_overlapped_with_Step0_pretrained_from_15-5 --task 15-5s --overlap --step 2 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=2,3 python -m torch.distributed.launch --master_port 1996 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_overlapped_with_Step0_pretrained_from_15-5 --task 15-5s --overlap --step 3 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=2,3 python -m torch.distributed.launch --master_port 1996 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_overlapped_with_Step0_pretrained_from_15-5 --task 15-5s --overlap --step 4 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=2,3 python -m torch.distributed.launch --master_port 1996 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_overlapped_with_Step0_pretrained_from_15-5 --task 15-5s --overlap --step 5 --lr 0.001 --epochs 30 --method MiB

For the 15-5:

  • Disjoint:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5_lr_0.01_with_pretrained_0913 --task 15-5 --step 0 --lr 0.01 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5_lr_0.01_with_pretrained_0913 --task 15-5 --step 1 --lr 0.001 --epochs 30 --method MiB

  • Overlapped:

CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2001 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5_lr_0.01_with_pretrained_0913_overlapped --task 15-5 --overlap --step 0 --lr 0.01 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2001 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5_lr_0.01_with_pretrained_0913_overlapped --task 15-5 --overlap --step 1 --lr 0.001 --epochs 30 --method MiB

That's weird.
You are using the correct commands, it's strange that the results are so different. Maybe it's just the noise of the setting, but from my experience, it should not change too much (otherwise, I would have performed multiple runs).
Have you tried with a different random seed? Just to see how much does it change.

@fcdl94 I did not use the different random seed but I can give it a try and may back to let you know.

@fcdl94 For the Disjoint setting, I try to use a different random seed like 100 and 200, the results for each random seed are:

  • Random seed 100: the final mIoU is 33.09, the 1-15 mIoU is 39.39, the 16-20 is 14.19.

  • Random seed 200: the final mIoU is 30.92, the 1-15 mIoU is 36.61, the 16-20 is 13.85.

Comments:

  • From my perspective, it seems like the random seed may change performance a bit, mainly for the base classes.

The corresponding command lines:

  • Random seed 100:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_RandomSeed_100 --task 15-5s --step 0 --lr 0.01 --epochs 30 --method MiB --random_seed 100

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_RandomSeed_100 --task 15-5s --step 1 --lr 0.001 --epochs 30 --method MiB --random_seed 100

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_RandomSeed_100 --task 15-5s --step 2 --lr 0.001 --epochs 30 --method MiB --random_seed 100

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_RandomSeed_100 --task 15-5s --step 3 --lr 0.001 --epochs 30 --method MiB --random_seed 100

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_RandomSeed_100 --task 15-5s --step 4 --lr 0.001 --epochs 30 --method MiB --random_seed 100

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_RandomSeed_100 --task 15-5s --step 5 --lr 0.001 --epochs 30 --method MiB --random_seed 100

  • Random Seed 200:

CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2010 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_RandomSeed_200 --task 15-5s --step 0 --lr 0.01 --epochs 30 --method MiB --random_seed 200

CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2010 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_RandomSeed_200 --task 15-5s --step 1 --lr 0.001 --epochs 30 --method MiB --random_seed 200

CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2010 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_RandomSeed_200 --task 15-5s --step 2 --lr 0.001 --epochs 30 --method MiB --random_seed 200

CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2010 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_RandomSeed_200 --task 15-5s --step 3 --lr 0.001 --epochs 30 --method MiB --random_seed 200

CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2010 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_RandomSeed_200 --task 15-5s --step 4 --lr 0.001 --epochs 30 --method MiB --random_seed 200

CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2010 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_RandomSeed_200 --task 15-5s --step 5 --lr 0.001 --epochs 30 --method MiB --random_seed 200

Hi @wuyujack, sorry for the delay in answering.
I finally understand what's the problem.
We did not notice that, during the validation, a hyper-parameter for our method was different. In particular, the loss_kd for the 15-1 scenario was 100 and not 10 as in the others. However, while doing the "shortcut" to reproduce the results (which means using just --method MiB and not --unce --unkd --loss_kd 10 (or 100) --init_balanced) I totally forgot that in just one case it was to be set at 100.
Can you try using --unce --unkd --loss_kd 100 --init_balanced instead of --method MiB, both for overlapped and disjoint?

Hi @wuyujack, sorry for the delay in answering.

I finally understand what's the problem.

We did not notice that, during the validation, a hyper-parameter for our method was different. In particular, the loss_kd for the 15-1 scenario was 100 and not 10 as in the others. However, while doing the "shortcut" to reproduce the results (which means using just --method MiB and not --unce --unkd --loss_kd 10 (or 100) --init_balanced) I totally forgot that in just one case it was to be set at 100.

Can you try using --unce --unkd --loss_kd 100 --init_balanced instead of --method MiB, both for overlapped and disjoint?

Sure and thanks for your reply! I will get back to you with the new results

Hi @wuyujack, sorry for the delay in answering.

I finally understand what's the problem.

We did not notice that, during the validation, a hyper-parameter for our method was different. In particular, the loss_kd for the 15-1 scenario was 100 and not 10 as in the others. However, while doing the "shortcut" to reproduce the results (which means using just --method MiB and not --unce --unkd --loss_kd 10 (or 100) --init_balanced) I totally forgot that in just one case it was to be set at 100.

Can you try using --unce --unkd --loss_kd 100 --init_balanced instead of --method MiB, both for overlapped and disjoint?

BTW, could you also confirm again the configuration for other settings, such as 100-10 and 50-50-50, are correct based on the provided commands? Thanks in advance!

Yes, in all the settings the loss_kd parameter was 10, just in the case of 15-1 we used 100.

Hi @fcdl94, with the new setting of the loss_kd as 100, I got some results with different random seeds.

  • For the disjoint setting, I can get
Random Seed 42:
1-15: 43.04%
16-20: 13.77%
all except 0 (background): 35.73%
Random Seed 100: 
1-15: 38.15%
16-20: 15.22%
all except 0 (background): 32.41%
For step 0 using the pretrain model from 15-5: 
1-15: 38.61%
16-20: 12.51%
all except 0 (background): 32.08%
  • For the overlapped setting, I can get
Random Seed 42:
1-15: 42.15%
16-20: 14.18%
all except 0 (background): 35.16%
For step 0 using the pretrain model from 15-5: 
1-15: 37.30%
16-20: 13.87%
all except 0 (background): 31.44%

It seems the result of the disjoint setting can approximate to the paper's result with the new loss_kd parameter, but it also may vary with a different random seed.

Yes, It something that happens when it is trained only on one class (as for 19-1). Thanks for the update.

Hi @fcdl94, BTW I have another question. As you only save the image index in the train-{}.npy, since many images may map to several semantic labels, then during training, how do you specify the gt label for each image with only the image index for both VOC and ADE20k dataset, especially for the 15-5 and 100-10 and 100-50 settings when several task labels are including for one step? And do you have the corresponding gt labels for each train-{}.npy saved as .npy file for reference, for example, for each label in step 1 of 100-10a in ADE20k, how many images are sampled for each of them (label 101-110)?

Hi @wuyujack.
In the process of creating the dataset I filter the labels which are not of interest (old classes or yet unseen) with the correct values (which is background). This is done in the dataset file, such as in

lambda t: t.apply_(lambda x: self.inverted_order[x] if x in self.labels else masking_value))
for ADE20K.

Regarding the number of images in each step, we used a heuristic to split the dataset, which was done by my colleague @mancinimassimiliano. The code to split the dataset can be found here https://github.com/fcdl94/MiB/blob/master/ADE-Split.ipynb.

I hope to have interpreted correctly your questions.

Hi @wuyujack.
In the process of creating the dataset I filter the labels which are not of interest (old classes or yet unseen) with the correct values (which is background). This is done in the dataset file, such as in

lambda t: t.apply_(lambda x: self.inverted_order[x] if x in self.labels else masking_value))

for ADE20K.
Regarding the number of images in each step, we used a heuristic to split the dataset, which was done by my colleague @mancinimassimiliano. The code to split the dataset can be found here https://github.com/fcdl94/MiB/blob/master/ADE-Split.ipynb.

I hope to have interpreted correctly your questions.

Hi, @fcdl94 Thanks for your reply. The line 256 you shared is exactly what I have tested yesterday, and I found that when we use the __get_item__ method to retrieve an example of the dataset, sometimes the target does have more than two unique labels for that sample, and the target may also vary if you call the __get_item__ again since the transform you using has a random crop, which may influence the target as some of the pixels with the labels we want may not be included inside the cropped sample.

Therefore, for the ADE20k dataset and the 100-10 setting, during the training, the sample may have several gt semantic labels (all belong to the new label set of that step) during the training, right? For example for the step 1 (incremental label 101-110) of 100-10a, given the image 16941 (index of full data), the original image has the following unique labels array([ 0, 2, 3, 5, 7, 10, 13, 18, 21, 44, 53, 70, 87, 88, 101, 103], dtype=uint8), then after the line 256 and transform, by using the __get_item__ method, we may get the unique value of the target for this sample as tensor([ 0, 101, 103], dtype=torch.uint8), which means we are using a single image but training for more than one incremental labels (all including in that step's new label set) at once. Please correct me if I am wrong.

In an incremental step are considered all the labels belonging to that incremental step. We don't just train an image for just one label, but we use all labeled pixels that belong to the current task classes and we put the others to zero.

Yes, getting the same index may vary the labels, since the image and the label are randomly cropped, and after it is performed the target transform (which is the label mapping).

I see and thanks for your clarification!

@fcdl94 BTW, if I want to use the standard BN for training, where should I find the pretrained model for the ResNet101, as the inplaceABN repo does not contain?

You can use the Pytorch one, but you need to modify the model to match its definition

You can use the Pytorch one, but you need to modify the model to match its definition

You mean the ResNet 101 pretrained from here?

'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth'

Yes, however, you need to reimplement it to add the dilations and to match the variable names