microsoft / esvit

EsViT: Efficient self-supervised Vision Transformers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Results without multi-crop

BoPang1996 opened this issue · comments

Hello,
Thanks for the code. I have noticed that the multi-crop trick can boost the performance by about 5% top-1 acc (on DINO, SwAV). Since your code base supports disabling this trick, did you conduct the experiments without this multi-crop trick, and would you be so kind that share the results on ImageNet?

No, I did not try experiments without multi-crop. When there are spare GPUs available, I will run the 2-crop settings, and post the results. It will take a while.

For network architecture Swin-Tiny, the 2-crop results are reported as follows:

#Pre-train Epochs Pre-train Task k-NN Linear
100 L_V 60.30 65.31
100 L_V +L_R 61.95 67.84
300 L_V 65.54 70.53
300 L_V +L_R 67.56 72.02