lucidrains / vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Although there is no bug in this line, '1 n d' should be '1 1 d' in my opinion, it would be confused.

KKIverson opened this issue · comments

cls_tokens = repeat(self.cls_token, '1 n d -> b n d', b = b)

@KKIverson ohh sure, i made the change

so the b has to be kept at the batch size, since we are concatting the cls tokens on the 1st dimension

@KKIverson ohh sure, i made the change

so the b has to be kept at the batch size, since we are concatting the cls tokens on the 1st dimension

Thanks a lot. Actually '1' or 'n' makes no differences, but as a rookie, I think it has some influence on my understanding. ^_^