lucidrains / vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Would it better swap the `masked_tokens` and `unmasked_tokens`?

CiaoHe opened this issue · comments

Hi Phil:

I wonder whether here would it be better to swap the decoder_tokens and masked_tokens?

decoder_tokens = torch.cat((decoder_tokens, mask_tokens), dim = 1)

Since in Line55, the first part is masked_tokens and the second part is unmasked

masked_indices, unmasked_indices = rand_indices[:, :num_masked], rand_indices[:, num_masked:]

@CiaoHe hello! 👋 sounds good, i made the change! 9f8c606