Low accuracy for session bigger than 1
fransiskusyoga opened this issue · comments
I tried to reproduce your work using the provided docker. I tries to train and evaluate on cifar. Because I only have 1 gpu, I edited the config to have 512 sample_per_gpu instead of only 64. then I run this command
bash tools/dist_train.sh configs/cifar/resnet12_etf_bs512_200e_cifar.py 1 --work-dir /opt/logger/cifar_etf --seed 0 --deterministic && bash tools/run_fscil.sh configs/cifar/resnet12_etf_bs512_200e_cifar_eval.py /opt/logger/cifar_etf /opt/logger/cifar_etf/best.pth 1 --seed 0 --deterministic
the result is as follows
2023-04-14 21:32:58,114 - mmcls - INFO - loss1 5.088033676147461 ; loss2 5.087942123413086
2023-04-14 21:32:58,119 - mmcls - INFO - [198/200] Training session : 9 ; lr : 0.00025 ; loss : 5.035999774932861 ; acc@1 : 0.0
2023-04-14 21:32:58,119 - mmcls - INFO - loss1 5.036115646362305 ; loss2 5.035883903503418
2023-04-14 21:32:58,124 - mmcls - INFO - [199/200] Training session : 9 ; lr : 0.00025 ; loss : 5.110080718994141 ; acc@1 : 0.0
2023-04-14 21:32:58,127 - mmcls - INFO - [200/200] Training session : 9 ; lr : 0.00025 ; loss : 5.047082901000977 ; acc@1 : 12.5
2023-04-14 21:32:58,127 - mmcls - INFO - Evaluating session 9, from 0 to 100.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 10000/10000, 17685.8 task/s, elapsed: 1s, ETA: 0s
2023-04-14 21:32:58,816 - mmcls - INFO - [09]Evaluation results : acc : 1.00 ; acc_base : 0.00 ; acc_inc : 2.50
2023-04-14 21:32:58,816 - mmcls - INFO - [09]Evaluation results : acc_incremental_old : 2.86 ; acc_incremental_new : 0.00
2023-04-14 21:32:58,888 - mmcls - INFO - 82.73 57.63 1.40 1.33 1.25 1.18 1.11 1.05 1.00
the evaluation after session 0 is very low.
Hi @fransiskusyoga ,
Thanks for your interest in our work.
Could you please provide the full log for me to check the details?
Best,
Haobo Yuan
Based on the current information, I guess there might be the possibility that you set the batch_size for
configs/cifar/resnet12_etf_bs512_200e_cifar.py
only, but did not set it for
configs/cifar/resnet12_etf_bs512_200e_cifar_eval.py
since the base session seems ok but incremental sessions are not right.
Based on the current information, I guess there might be the possibility that you set the batch_size for
configs/cifar/resnet12_etf_bs512_200e_cifar.py
only, but did not set it for
configs/cifar/resnet12_etf_bs512_200e_cifar_eval.py
since the base session seems ok but incremental sessions are not right.
I edit neither of them. I Just edit this
configs/_base_/datasets/cifar_fscil.py
mylog.log
this is the evaluation log
@fransiskusyoga I am afraid that the log seems to have nothing.
how about this
mylog.log
Seems the config is right but the loss cannot decrease in the incremental training. Considering that the there will be subtle difference of using 1 gpu or multi gpus, I think you may need to try to adjust the incremental hyperparameters or use 8 gpus to try again.
You may need to set
Line 554 in f89a4ef
to 64 to match the incremental batch size if you insist on using 1 gpu for incremental training.
I am not sure whether there are any other bugs if you use 1 gpu in the incremental training. So, please let me know if you have any other questions.
Hi @fransiskusyoga ,
I would like to close the issue first, feel free to re-open it or raise a new one if you have any other question.
Thanks again for your interests.
Regards,
Haobo Yuan