Confusion in class OutlookAttention moduel

Question

Confusion in class OutlookAttention moduel

axhiao opened this issue 3 years ago · comments

in class OutlookAttention, there is self.v = nn.Linear(dim, dim, bias=qkv_bias) and the input of this class is x whose shape is B, H, W, C = x.shape. My quesion is how this code v = self.v(x).permute(0, 3, 1, 2) # B, C, H, W can go well without exception because matrix multiplication [B, H, W, C] * [dim, dim] will do here. And also in the original paper, Algorithm 1 implements v_pj = nn.Linear(C, C). But in your codes, C is replaced with dim. Thanks!

YuanLi · Answer 1 · Mon Jun 28 2021 10:00:10 GMT+0800 (China Standard Time)

dim in Transformer and Outlooker is the channel C, so dim=C.