GMvandeVen / continual-learning

PyTorch implementation of various methods for continual learning (XdG, EWC, SI, LwF, FROMP, DGR, BI-R, ER, A-GEM, iCaRL, Generative Classifier) in three different scenarios.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Performance

Johswald opened this issue · comments

hey again!

when I execute
./main.py --ewc --online --lambda=5000 --gamma=1 --scenario task

this should be close to 99% acc no?

For EWC and SI I get much worse performance with the default values.
What am I doing wrong? Thank you!

It's actually indeed the case that for those hyperparameter-values the performance of Online EWC on the split MNIST task protocol is rather bad. This issue has confused me for quite a while as well. It turns out that on the split MNIST protocol, the by the developers recommended (or default) hyperparameter-values of SI and especially of EWC don't work very well. For EWC and Online EWC, lambda even needs to be set several orders of magnitude larger. See also Appendix D and the footnote on page 7 of our paper: https://arxiv.org/pdf/1904.07734.pdf.

Thanks for this prompt response - ok I thought that you set the default values to the ones to get your reported accs. Would it be possible to share the calls? Its a bit hard to get the best values out of your hyperparameter search plots. Thanks again for this repo - its really helpfull!

Ah yes, sorry. It would indeed have been good to at least report those hyper parameter values somewhere. Here are all the calls with the values we selected:

For split MNIST:

./main.py --scenario=task --xdg=0.95
./main.py --scenario=task --ewc --lambda=10000000
./main.py --scenario=task --ewc --online --lambda=100000000 --gamma=0.8
./main.py --scenario=task --si --c=50

./main.py --scenario=domain --ewc --lambda=1000000
./main.py --scenario=domain --ewc --online --lambda=100000000 --gamma=0.7
./main.py --scenario=domain --si --c=500

./main.py --scenario=class --ewc --lambda=100000000
./main.py --scenario=class --ewc --online --lambda=1000000000 --gamma=0.8
./main.py --scenario=class --si --c=0.5

For permuted MNIST:

./main.py --experiment=permMNIST --tasks=10 --scenario=task --xdg=0.55
./main.py --experiment=permMNIST --tasks=10 --scenario=task --ewc --lambda=500
./main.py --experiment=permMNIST --tasks=10 --scenario=task --ewc --online --lambda=500 --gamma=0.8
./main.py --experiment=permMNIST --tasks=10 --scenario=task --si --c=5

./main.py --experiment=permMNIST --tasks=10 --scenario=domain --ewc --lambda=500
./main.py --experiment=permMNIST --tasks=10 --scenario=domain --ewc --online --lambda=1000 --gamma=0.9
./main.py --experiment=permMNIST --tasks=10 --scenario=domain --si --c=5

./main.py --experiment=permMNIST --tasks=10 --scenario=class --ewc --lambda=1
./main.py --experiment=permMNIST --tasks=10 --scenario=class --ewc --online --lambda=5 --gamma=1
./main.py --experiment=permMNIST --tasks=10 --scenario=class --si --c=0.1

Thank you again!

Hello, there @GMvandeVen. I am trying to run EWC and SI experiments with your hyperparameters, but when I use the following commands the average precisions are poor.

./main.py --scenario=class --ewc --lambda=100000000
./main.py --scenario=class --si --c=0.5
./main.py --experiment=permMNIST --tasks=10 --scenario=class --ewc --lambda=1
./main.py --experiment=permMNIST --tasks=10 --scenario=class --si --c=0.1

However, the commands for the task scenario works well.

./main.py --scenario=task --ewc --lambda=10000000
./main.py --scenario=task --si --c=50
./main.py --experiment=permMNIST --tasks=10 --scenario=task --ewc --lambda=500
./main.py --experiment=permMNIST --tasks=10 --scenario=task --si --c=5

Any suggestion?

Hi @YeeCY, thanks for your interest in my code. The observation you describe is correct, the methods EWC and SI actually do not work well with class-incremental learning (--scenario=class), even with their best hyper parameters; while these methods do work reasonably well with task-incremental learning (--scenario=task). See for example this paper (https://arxiv.org/abs/1904.07734) for more details on the difference between these scenarios. Hope this helps!

Hi @YeeCY, thanks for your interest in my code. The observation you describe is correct, the methods EWC and SI actually do not work well with class-incremental learning (--scenario=class), even with their best hyper parameters; while these methods do work reasonably well with task-incremental learning (--scenario=task). See for example this paper (https://arxiv.org/abs/1904.07734) for more details on the difference between these scenarios. Hope this helps!

Ok, that's a good summary. And I will try to run with task-incremental learning. By the way, would you mind providing best hyperparameters for other algorithms like AGEM?