Feature sequences
kimkj38 opened this issue · comments
Hello, I'm confused about token size because it's square(8x8 or 16x16) on ViT.
As I understand, feature sequence means 256 tokens with 1024 dimension vector.
And
Then 4 partitions of this figure is just for simplification? Or did I get it wrong?
Yes, for simplicity, this figure shows the case of
@zhigangjiang Thanks for your reply!