Sense-GVT / DeCLIP

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why freeze the parameters of conv1 in ViT?

Yuting-Gao opened this issue · comments

As described in MoCoV3 [https://arxiv.org/abs/2104.02057],
random patch projection (\ie, freezing the parameters of conv1 in ViT) stabilizes training with smoother and better training curves, which also works in our framework. However, though He \etal. argues that the stability benefits the final accuracy, there is no significant gain in our previous experiments.