About the attention calculation in FASA
csguoh opened this issue · comments
Hi, authors.
This work does inspire me a lot! I have a question about the Frequency domain-based self-attention solver
In this line, it seems that you directly use element-wise multiplication, however in classic attention, Matul (or @) is used.
I can not find any explanation in the paper, so could you give me some insight about this? thanks:D
你好,作者们。
对于这里并没有使用矩阵乘法而是对应元素相乘。
我也对此问题感到困惑,期望得到解答,谢谢。
I guess the reason why the authors did the element-wise product is that multiplication in frequency domain is equivalent to conv in space domain.
Just my opinion,
out = self.norm(out) # calculate the score matrix
output = v * out # multiply the v matrix by the score matrix