GMvandeVen / continual-learning

PyTorch implementation of various methods for continual learning (XdG, EWC, SI, LwF, FROMP, DGR, BI-R, ER, A-GEM, iCaRL, Generative Classifier) in three different scenarios.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Joint training results different for different types of incremental learning?

toshi2k2 opened this issue · comments

Isn't joint training defined as training done on all the data at the same time? In that case, shouldn't it be the same for all three scenarios of CL? However, the results from the code (and in the paper) are not the same. Is joint training defined differently?

Joint training is indeed training on all the data at the same time, but this gives different results for the three continual learning scenarios because in each scenario the network must learn something else. For Split MNIST, the different mappings that the network is supposed to learn are illustrated in Figure 2 in the accompanying article (https://www.nature.com/articles/s42256-022-00568-3#Fig2). Hope this helps!

So, for joint training (task incremental and domain incremental), the output size is equal to 'within-context' label size (for above example, its 2) and for class incremental its the 'global-label' size which is 10 in the above case. Is my understanding correct?

For domain- and class-incremental learning that is correct. For task-incremental learning the output size is typically taken to be equal to the 'global-label' size, with the provided context label being used to set only those output units of classes in the current task to 'active' (i.e., to have a multi-head output layer).