arthurdouillard / CVPR2021_PLOP

Official code of CVPR 2021's PLOP: Learning without Forgetting for Continual Semantic Segmentation

Home Page:https://arxiv.org/abs/2011.11390

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Clarification regarding domain shift experiments on Cityscapes

prachigarg23 opened this issue · comments

Hi @arthurdouillard, I really enjoyed reading your work! Thanks for bringing in the domain shift aspect of CSS. I have the following doubts in the implementation of ILT, MiB and PLOP for the domain shift experiments on Cityscapes (Table 5):

  1. wrt PLOP: I'm assuming the pseudo labeling will not be applicable in these experiments as the label spaces are fixed in the domain incremental scenario. So do I just use the distillation loss along with regular cross-entropy? Is my understanding correct wrt using PLOP in a domain IL scenario?
  2. MiB modifies distillation and cross-entropy to tackle the background class shift issue. Since there is no such issue in the domain incremental scenario, doesn't their method get reduced to ILT (basically LwF)? I'm confused as to why there is a difference in the performance (For e.g. 59% for ILT and 61.5% for MiB in the 11-5 case).

Also, is it possible to share the joint model (traditional segmentation model) mIoU you get for Cityscapes on DeeplabV3, ResNet101? (I couldn't find this in the paper and wanted to see the drop wrt the joint one).

Thank you for your interest in my work :)

  1. Yes, you're right.
  2. ILT has the KD (which is == as the kd of MiB in this very experiment) but also a MSE between features. see
    opts.loss_de = 100

I don't think I ever run the joint model on Cityscapes, you're right that it could be useful. If I find some spare GPUs, I'll run this exp.

Thank you for the prompt reply!

  • Thanks for the clarification, I guess I skipped the MSE in their best performing model.
  • Yeah I think the joint performance will indicate the actual difficulty of the domain IL scenario - the amount of forgetting, etc.

Actually I'm working on CSS on Cityscapes. I want to compare the drop in performance wrt base model against your method. Let me know if possible.

Hey,

So I didn't re-run anything new as I didn't have time for it, but I found some results:

First of all in my follow up paper (https://arxiv.org/abs/2106.15287) I used for cityscapes the resolution 512x1024 while for the original paper (https://arxiv.org/abs/2011.11390) I've used 512x512 (which makes less sense because images are originally rectangle not square).

So with 512x1024, with 50 epochs, I've got around 58.06. So compared this results with my second paper (https://arxiv.org/abs/2106.15287). This is not super high and we could definitly be better but I kept the same training schedule used by all models so simplicity.

While not comparable to Cityscapes' results in PLOP, does that answer your question?

Hi, thanks for getting back. Yeah actually I was trying to reproduce the 77% mIoU performance on Cityscapes as I need that for my experiments. I'm currently getting 70% and asked for PLOP's result to see incase it was 75%+. But I understand it depends on the learning schedule used so I'm trying to use the DeeplabV3 paper's hyperparameters.
Thanks for your help!

Hi, @arthurdouillard I have a small doubt. In the LwF and ILT experiments, loss_kd and loss_de have been set to 100, which I believe is the regularization factor for the soft cross entropy loss in the total loss. But in the LwF and ILT (ICCVW 2019) papers I saw that this loss balance weight is set to 1 not 100. Is there a reason for this? I was wondering if you could help resolve this confusion.

For the baselines (like LwF and ILT) all hyperparameters (except number of epochs) are from Cermelli et al.'s MiB. I didn't tune them as Cermelli already tune them for segmentation (although not the same dataset I agree).