lucidrains / vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MAE decoder pos_emb

dnecho opened this issue · comments

Is it necessary add pos_emb to deocder_tokens?

decoder_tokens = decoder_tokens + self.decoder_pos_emb(unmasked_indices)

@dnecho ohh yes, i'm actually not so sure about that - you may be right that it isn't necessary for the unmasked tokens

@dnecho it probably wouldn't hurt to keep it the way it is

@dnecho the other thing to experiment with is reusing the positional embeddings from the original encoder side