Confusion in class OutlookAttention moduel
axhiao opened this issue · comments
Minglei Yin commented
in class OutlookAttention
, there is self.v = nn.Linear(dim, dim, bias=qkv_bias)
and the input of this class is x
whose shape is B, H, W, C = x.shape
. My quesion is how this code v = self.v(x).permute(0, 3, 1, 2) # B, C, H, W
can go well without exception because matrix multiplication [B, H, W, C] * [dim, dim]
will do here. And also in the original paper, Algorithm 1 implements v_pj = nn.Linear(C, C)
. But in your codes, C
is replaced with dim
. Thanks!
YuanLi commented
dim in Transformer and Outlooker is the channel C, so dim=C.