Question about the 'unfold operation.
JaminFong opened this issue · comments
Hi Li,
I have just read your paper "Tokens-to-Token ViT", which proposes a very interesting and effective method. I have a question puzzling me.
The "unfold" operation followed by a linear layer for generating "qkv" seemingly equals a kxk convolution. I wonder whether I understand right. Please correct me if I'm wrong.
Hoping for your reply.
Best regards.
Line 63 in fecacc4
T2T-ViT/models/token_performer.py
Line 46 in fecacc4
Hi Jamin,
Good question! Your observation is interesting. But you cannot directly combine unfold with qkv as there is a layernorm before qkv operation in self-attention.
Thanks so much for your quick reply. Yes, the layernorm does matter and may produce some effect.