Masked Auto Encoder, class token and linear probing

Question

Masked Auto Encoder, class token and linear probing

Gasp34 opened this issue 2 years ago · comments

Hello,

If I understand correctly, when doing linear probing, you only train the last FC layer.
But in the classification head of the ViT, the last FC layer uses the class token, that has not been trained during the MAE self-supervised task.
How can we expect to have good features in the class token if it has not been trained ?

Thanks

RoelvH97 · Answer 1 · Fri Jun 09 2023 20:13:28 GMT+0800 (China Standard Time)

I was having a similar issue, but I believe it shouldn't be a problem - the original paper mentions that they achieve similar performance for linear probing when using average pooling. So since the class token is indeed not usable, I would suggest trying that option.

Good luck!