LTH14 / rcg

PyTorch implementation of RCG https://arxiv.org/abs/2312.03701

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

why the projector head MLP set to requires_grad=False

woshixiaobai2019 opened this issue · comments

I noticed that a new projector head MLP is added after loading the pre-trained MoCo v3 model. However, the parameters of this newly added component are also set to requires_grad=False.

My question is - since this MLP head is randomly initialized, why does it not require any training before being used for feature projection?

Intuitively, adding an untrained random projection head could disrupt the original feature distributions learned by the pre-trained encoder. So what is the motivation behind fixing the parameters of this newly added head?

Does it relate to better retaining the pre-trained feature distributions? Or leveraging the fixed random projections to improve generalization of the downstream tasks?

It will be great if someone could help explain the rationale behind not training the added projector head. Thanks!

I have the same question, @LTH14 could you please clarify how do you train the newly added projection head of the MoCoV3 model?

The head in moco_vits is inherited from timm's VisionTransformer, which is a single Linear layer. However, the projection head of the pre-trained moco-v3 is an MLP (module.base_encoder.head). I don't want to change the original MoCo code and re-train that model, so instead I replace the Linear head with an MLP head so that the pre-trained weights can be loaded.

Note that the initialization of the pre-trained encoder (and MLP) is before loading the pre-trained weights, as shown here https://github.com/LTH14/rcg/blob/main/pixel_generator/mage/models_mage.py#L263-L278