Joint training results different for different types of incremental learning?

Question

Joint training results different for different types of incremental learning?

toshi2k2 opened this issue 10 months ago · comments

Isn't joint training defined as training done on all the data at the same time? In that case, shouldn't it be the same for all three scenarios of CL? However, the results from the code (and in the paper) are not the same. Is joint training defined differently?

Gido van de Ven · Answer 1 · Tue Aug 22 2023 21:37:52 GMT+0800 (China Standard Time)

Joint training is indeed training on all the data at the same time, but this gives different results for the three continual learning scenarios because in each scenario the network must learn something else. For Split MNIST, the different mappings that the network is supposed to learn are illustrated in Figure 2 in the accompanying article (https://www.nature.com/articles/s42256-022-00568-3#Fig2). Hope this helps!

toshi2k2 · Answer 2 · Tue Aug 22 2023 21:45:48 GMT+0800 (China Standard Time)

So, for joint training (task incremental and domain incremental), the output size is equal to 'within-context' label size (for above example, its 2) and for class incremental its the 'global-label' size which is 10 in the above case. Is my understanding correct?

Gido van de Ven · Answer 3 · Tue Aug 22 2023 22:20:32 GMT+0800 (China Standard Time)

For domain- and class-incremental learning that is correct. For task-incremental learning the output size is typically taken to be equal to the 'global-label' size, with the provided context label being used to set only those output units of classes in the current task to 'active' (i.e., to have a multi-head output layer).