sail-sg / volo

VOLO: Vision Outlooker for Visual Recognition

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Confusion in class OutlookAttention moduel

axhiao opened this issue · comments

in class OutlookAttention, there is self.v = nn.Linear(dim, dim, bias=qkv_bias) and the input of this class is x whose shape is B, H, W, C = x.shape. My quesion is how this code v = self.v(x).permute(0, 3, 1, 2) # B, C, H, W can go well without exception because matrix multiplication [B, H, W, C] * [dim, dim] will do here. And also in the original paper, Algorithm 1 implements v_pj = nn.Linear(C, C). But in your codes, C is replaced with dim. Thanks!

dim in Transformer and Outlooker is the channel C, so dim=C.