arthurdouillard / incremental_learning.pytorch

A collection of incremental learning paper implementations including PODNet (ECCV20) and Ghost (CVPR-W21).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Weird results of icarl on Cifar-100

GengDavid opened this issue · comments

Hi, @arthurdouillard
Thanks for your great work! I am trying to use your codes to reproduce the icarl method but the results are not as normal as the results in your paper.
I run the following script:

python3 -minclearn --options options/icarl/icarl_cifar100.yaml options/data/cifar100_3orders.yaml \
    --initial-increment 50 --increment 1 --fixed-memory \
    --device 0 --label icarl_cnn_cifar100_50steps \
    --data-path data

I obtain 44.96, 42.55, 27.76 with three seeds, so that avg= 38.43 +/- 9.32. Am I missing something to reproduce the results? Thanks.

Post the log here for reference.

 2021-10-19:23:12:46 [train.py]: Eval on 0->100.

 2021-10-19:23:12:49 [train.py]: icarl_cnn_cifar100_50steps

 2021-10-19:23:12:49 [train.py]: Avg inc acc: 0.2775882352941177.

 2021-10-19:23:12:49 [train.py]: Current acc: {'total': 0.186, '00-09': 0.202, '10-19': 0.184, '20-29': 0.222, '30-39': 0.178, '40-49': 0.219, '50-59': 0.139, '60-69': 0.21, '70-79': 0.162, '80-89': 0.163, '90-99': 0.18}.

 2021-10-19:23:12:49 [train.py]: Avg inc acc top5: 0.5672156862745097.

 2021-10-19:23:12:49 [train.py]: Current acc top5: {'total': 0.437}.

 2021-10-19:23:12:49 [train.py]: Forgetting: 0.47154545454545455.

 2021-10-19:23:12:49 [train.py]: Cord metric: 0.26.

 2021-10-19:23:12:49 [train.py]: Old accuracy: 0.18, mean: 0.26.

 2021-10-19:23:12:49 [train.py]: New accuracy: 0.51, mean: 0.66.

 2021-10-19:23:12:49 [train.py]: Average Incremental Accuracy: 0.2775882352941177.

 2021-10-19:23:12:49 [train.py]: Training finished in 4317s.

 2021-10-19:23:12:49 [train.py]: Label was: icarl_cnn_cifar100_50steps

 2021-10-19:23:12:49 [train.py]: Results done on 3 seeds: avg: 38.43 +/- 9.32, last: 28.0 +/- 8.14, forgetting: 41.62 +/- 5.17

 2021-10-19:23:12:49 [train.py]: Individual results avg: [44.96, 42.55, 27.76]

 2021-10-19:23:12:49 [train.py]: Individual results last: [32.9, 32.5, 18.6]

 2021-10-19:23:12:49 [train.py]: Individual results forget: [40.79, 36.92, 47.15]

Hum, your third class order has weirdly low results while the first two seems to correspond to my paper results.

I'm going to run it on my side.

Ok that's very weird. If I run all three orders one after the other, as you did, I got your results.

But If I launching only the third order (just edit the cifar100_3orders.yaml file to have only the third order and third seed), I've got avg: 45.77, last: 32.8, forgetting: 41.85 so a "normal" result.

I don't really know the solution, but for now, try running the each class order in a different process maybe. I hope that helps!

Thanks for your feedback! I'll run it separately. If I find the reason, I will back to let you know.

Hi, @arthurdouillard
I have another question about your iCaRL implementation. In function build_examplars, it seems that you use all data to reconstruct the exemplar set. For continual learning, I think it is supposed that previous data are not available right? I feel a little confused about this function.

I'm extracting all features because it's simpler that way, but I'm not actually doing any new herding on old data, the line

if class_idx >= self._n_classes - self._task_size:
means that only new data will be sampled.

However, old data is still reduced

herding_indexes[class_idx] = selected_indexes

And note that we only use the selected features and not all features to compute the examplar mean:

selected_d = D[..., indexes]

Is that clearer?

I think I missed the if judgment. Now I see the difference.
Thanks for your kind explanation!

My pleasure :)