A question about DropPath in pretraining

Question

A question about DropPath in pretraining

YangSun22 opened this issue a year ago · comments

I found that DropPath is set to 0 in the pre-training and finetuning is set to 0.1. this does not match the way Dropout is used. It is supposed to prevent the occurrence of overfitting. But why is it not used in the pre-training?

class Block(nn.Module):

    def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_scale=None, drop=0., attn_drop=0.,
                 drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm):
        super().__init__()
        self.norm1 = norm_layer(dim)
        self.attn = Attention(
            dim, num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale, attn_drop=attn_drop, proj_drop=drop)
        # NOTE: drop path for stochastic depth, we shall see if this is better than dropout here
        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
        self.norm2 = norm_layer(dim)
        mlp_hidden_dim = int(dim * mlp_ratio)
        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)

    def forward(self, x):
        x = x + self.drop_path(self.attn(self.norm1(x)))
        x = x + self.drop_path(self.mlp(self.norm2(x)))
        return x

Alex Li · Answer 1 · Tue Dec 19 2023 05:22:44 GMT+0800 (China Standard Time)

It could be that the reconstruction task is very hard, so there's no overfitting during pretraining.