about attn_mask and relative_position_bias
YangGangZhiQi opened this issue · comments
Hi, thank you very much for this Great work!
I read your code and paper several times. I can't understand attn_mask and relative_position_bias totally.
In Line 134, why do we just add attn_mask to attn, not multiply? In this way, attn_mask seems like a bias, like relative_popsition_bias in Line 130.
I would appriciate it if someone could help me.
This is because we add the attn_mask (with -100 or 0) before softmax function. A^(x-100) = A^x * A^(-100) = A^x * 0 = 0
@JingyunLiang Thanks for your so quick reply. I get it.
About relative_position_bias, how to understand it? what's the role of relative_position_bias in WindowAttention module? It seems like atten = q @ k + relative_position_bias, what would happen if we do not use relative_position_bias?
I am beginner in SwinIR, Sorry for such questions may take your time.
relative_position_bias
can tell you the relative position of two pixels. A pixel should have higher impact on its neighbhouring pixels than distanct pixels.
I get it. Thanks again!