Why freeze the parameters of conv1 in ViT?
Yuting-Gao opened this issue · comments
Yuting-Gao commented
zlccccc commented
As described in MoCoV3 [https://arxiv.org/abs/2104.02057],
random patch projection (\ie, freezing the parameters of conv1 in ViT) stabilizes training with smoother and better training curves, which also works in our framework. However, though He \etal. argues that the stability benefits the final accuracy, there is no significant gain in our previous experiments.