facebookresearch / mae

PyTorch implementation of MAE https//arxiv.org/abs/2111.06377

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A question about DropPath in pretraining

YangSun22 opened this issue · comments

I found that DropPath is set to 0 in the pre-training and finetuning is set to 0.1. this does not match the way Dropout is used. It is supposed to prevent the occurrence of overfitting. But why is it not used in the pre-training?

class Block(nn.Module):

    def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_scale=None, drop=0., attn_drop=0.,
                 drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm):
        super().__init__()
        self.norm1 = norm_layer(dim)
        self.attn = Attention(
            dim, num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale, attn_drop=attn_drop, proj_drop=drop)
        # NOTE: drop path for stochastic depth, we shall see if this is better than dropout here
        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
        self.norm2 = norm_layer(dim)
        mlp_hidden_dim = int(dim * mlp_ratio)
        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)

    def forward(self, x):
        x = x + self.drop_path(self.attn(self.norm1(x)))
        x = x + self.drop_path(self.mlp(self.norm2(x)))
        return x

It could be that the reconstruction task is very hard, so there's no overfitting during pretraining.