Question on the results of L2P on imagenet-r
YingjianLi opened this issue · comments
I performed L2P method on ImageNet-R using the same pretrained model with oringial L2P, i.e., ViT_B_16 (https://storage.googleapis.com/vit_models/imagenet21k/ViT-B_16.npz). I used the default config except for the pretrained model. The results on the last task are as follows,
2024-01-11 21:34:33,538 [trainer.py] => All params: 171970192
2024-01-11 21:34:33,540 [trainer.py] => Trainable params: 199880
2024-01-11 21:34:33,540 [l2p.py] => Learning on 180-200
2024-01-11 21:41:09,156 [l2p.py] => Task 9, Epoch 10/10 => Loss 0.115, Train_accy 91.08, Test_accy 72.87
2024-01-11 21:41:55,918 [trainer.py] => No NME accuracy.
2024-01-11 21:41:55,918 [trainer.py] => CNN: {'total': 72.87, '00-19': 78.68, '20-39': 73.56, '40-59': 71.26, '60-79': 71.35, '80-99': 66.67, '100-119': 68.69, '120-139': 72.25, '140-159': 79.0, '160-179': 67.79, '180-199': 75.52, 'old': 72.59, 'new': 75.52}
2024-01-11 21:41:55,918 [trainer.py] => CNN top1 curve: [90.44, 84.38, 80.3, 79.0, 77.01, 75.53, 74.84, 74.34, 73.95, 72.87]
2024-01-11 21:41:55,918 [trainer.py] => CNN top5 curve: [97.94, 96.11, 94.58, 92.54, 91.6, 90.87, 90.42, 90.02, 89.04, 88.82]
2024-01-11 21:41:55,919 [trainer.py] => Average Accuracy (CNN): 78.266
2024-01-11 21:41:55,921 [trainer.py] => Accuracy Matrix (CNN):
[[90.44 87.35 86.32 86.03 84.85 83.24 83.24 80.29 79.71 78.68]
[ 0. 81.31 78.57 78.72 77.66 74.47 74.01 72.34 72.49 68.69]
[ 0. 0. 75.22 76.76 76.08 77.36 73.6 72.83 73.03 72.25]
[ 0. 0. 0. 73.13 73.31 76.42 76.9 80.05 79.66 79. ]
[ 0. 0. 0. 0. 70.81 70.46 74.7 73.4 70.55 67.79]
[ 0. 0. 0. 0. 0. 68.84 70.28 73.49 73.56 75.52]
[ 0. 0. 0. 0. 0. 0. 68.44 70.46 73.15 73.56]
[ 0. 0. 0. 0. 0. 0. 0. 68.44 71. 71.26]
[ 0. 0. 0. 0. 0. 0. 0. 0. 68.64 71.35]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 66.67]]
2024-01-11 21:41:55,921 [trainer.py] => Forgetting (CNN): 4.16111111111111
By the metrics in the original L2P or DualPrompt, the Average Accuracy (At) (mean of the last column in Accuracy Matrix (CNN), see page 20 in DualPrompt ) of L2P or DualPrompt should be 61.57% and 68.13%. However, By PILOT, the result if L2P is 72.48%.
I am wondering why the performance by PILOT is so good? Did I calculate the metric incorrectly or miss something?
Notably, the result will increase to 73+% by using IN21K pretrained and IN1k fine-tuned ViT_B_16 (vit_base_patch16_224.augreg2_in21k_ft_in1k in timm).
Please ignore the error caused by the 'sorted' function becuase i was using the old version of PILOT, and I found this bug has been fixed in the new version.
Thank you for your reply.
Hi @YingjianLi ,
Thank you for your attention. I'm sorry for the late response.
I think the difference might be due to parameter settings and random seeds. Our main code reference is https://github.com/JH-LEE-KR/l2p-pytorch. Additionally, our ImageNet-R dataset split follows the settings of DualPrompt.
@sun-hailong thank you! will explore these options.