cannot reproduce the results on CIFAR100

Question

cannot reproduce the results on CIFAR100

JungHunOh opened this issue a year ago · comments

Dear authors,
Firstly, I would like to express my appreciation for your interesting and motivating work. Thank you for your contributions to the field.
I am writing to inquire about the codes you have provided. I have attempted to reproduce the results on CIFAR-100 using the codes, but unfortunately, I have encountered some issues. (I am unsure if these issues also occur in other datasets.)
Specifically, there seems some problem during the incremental sessions.
I observed a loss explosion after session 4.
I should mention that the experiments were conducted on the docker environment.
I am wondering if there are any problems with the current version of the codes that may have caused these issues.
I attached the log files.

20230410_150211.log
20230410_160540.log

Thank you in advance for your time and assistance. I look forward to hearing back from you soon.

Haobo Yuan · Answer 1 · Wed Apr 12 2023 10:03:13 GMT+0800 (China Standard Time)

Hi @JungHunOh ,

Thanks for your interest in our work.

I have a brief view of your log. I notice that you are using two gpus:

GPU 0,1: NVIDIA GeForce RTX 2080 Ti

However, you did not change the batchsize (samples_per_gpu):

data = dict(
    samples_per_gpu=64,
    workers_per_gpu=8,
    train_dataloader=dict(persistent_workers=True),
    val_dataloader=dict(persistent_workers=True),
    test_dataloader=dict(persistent_workers=True),
...

So, the batchsize in total will be totally different, which may cause very different results.

Please consider run the code on a 8-gpu machine to re-produce the results.

If you insist to run on 2-gpu machine, please consider change

samples_per_gpu=64,

to

samples_per_gpu=256,

But I want to note that this will cause a different result since some subtle difference inside the PyTorch implementation.

Regards,
Haobo Yuan

Haobo Yuan · Answer 2 · Wed Apr 12 2023 14:29:44 GMT+0800 (China Standard Time)

Hi @JungHunOh ,

I would like to close the issue first, feel free to re-open it or raise a new one if you have any other question.

Thanks again for your interests.

Best,
Haobo Yuan

JungHunOh · Answer 3 · Wed Apr 12 2023 14:34:42 GMT+0800 (China Standard Time)

Thank you very much for your detailed answers.
My concerns are resolved.