Some questions about Axial-attention
Liqq1 opened this issue · comments
Hi~I have some questions about Axial-attention
Why there is no premute operation before view in mode h?
# for mode h
projected_query = self.query_conv(x).premute(0, 1, 3, 2).view(*view).permute(0, 2, 1)
I think premute is necessary. Although the shape of those values are correct to calculate,it has a very different meaning for mode h comparing to mode w. Without premute, the projected_query can't actually collect the columns to the dimension with size Hight
For example:
For mode W, the way of reshape is correct.
Without permute for mode H, it is obviously not what we want:
With permute for mode H,[0, 5, 10, 15] is the column of a.:
I think you are right. Thank you for letting me know. I'll keep in mind this in my future works.
Okay!And thanks for your great code work, this is a very clear template!
Any intention to re-train all model in this repo? @plemeri
I'm not planning to since this isn't our main contribution, but thanks for letting me know that the authors of CaraNet seems to be using our code without any citation. I really feel bad about it.
Hello, yes i had sadly misunderstood the usage of the attention module for CaraNet. I initially thought, that they implemented the code similar, but ended up not using the axial attention, misinterpreting the inital gamma value of the self attention layer and mixing up the implementaton with the reverse attention module. It is indeed correct, that the axial attention is used by CaraNet, unreferenced in the published paper. However i think they reference that fact in the official repository.
I am sry for disturbing you with that comment and wanted to delete it, once i realized my initial assumption to be wrong.