Why freeze the parameters of conv1 in ViT?

Question

Why freeze the parameters of conv1 in ViT?

Yuting-Gao opened this issue 2 years ago · comments

zlccccc · Answer 1 · Sat Mar 19 2022 18:06:26 GMT+0800 (China Standard Time)

As described in MoCoV3 [https://arxiv.org/abs/2104.02057],
random patch projection (\ie, freezing the parameters of conv1 in ViT) stabilizes training with smoother and better training curves, which also works in our framework. However, though He \etal. argues that the stability benefits the final accuracy, there is no significant gain in our previous experiments.