Some questions about Axial-attention

Question

Some questions about Axial-attention

Liqq1 opened this issue 3 years ago · comments

Hi~I have some questions about Axial-attention
Why there is no premute operation before view in mode h?

# for mode h
projected_query = self.query_conv(x).premute(0, 1, 3, 2).view(*view).permute(0, 2, 1)

I think premute is necessary. Although the shape of those values are correct to calculate，it has a very different meaning for mode h comparing to mode w. Without premute, the projected_query can't actually collect the columns to the dimension with size Hight
For example:

For mode W, the way of reshape is correct.

Without permute for mode H, it is obviously not what we want:

With permute for mode H,[0, 5, 10, 15] is the column of a.:

Taehun Kim · Answer 1 · Mon May 23 2022 08:33:57 GMT+0800 (China Standard Time)

I think you are right. Thank you for letting me know. I'll keep in mind this in my future works.

YOLO · Answer 2 · Mon May 23 2022 10:40:15 GMT+0800 (China Standard Time)

Okay！And thanks for your great code work, this is a very clear template！

Asthestarsfalll · Answer 3 · Thu Jun 02 2022 15:28:30 GMT+0800 (China Standard Time)

Any intention to re-train all model in this repo? @plemeri

Taehun Kim · Answer 4 · Thu Jun 02 2022 15:37:26 GMT+0800 (China Standard Time)

I'm not planning to since this isn't our main contribution, but thanks for letting me know that the authors of CaraNet seems to be using our code without any citation. I really feel bad about it.

YOLO · Answer 5 · Sat Aug 06 2022 16:41:49 GMT+0800 (China Standard Time)

Hi，did you delete your comment？I can only see it in the email notification，not in the githun issue 发自我的iPhone

…

------------------ Original ------------------ From: Simon Diener ***@***.***> Date: Tue,Jul 26,2022 3:35 AM To: plemeri/UACANet ***@***.***> Cc: machine no learning ***@***.***>, Author ***@***.***> Subject: Re: [plemeri/UACANet] Some questions about Axial-attention (Issue#8) It seems as if they don't even use the axial attention module they mention in the paper, as they apply a residual connection in the end and multiply the output of the axial attention module with zero. At least this is what my unknowing eyes see in their github repository. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

Simon Diener · Answer 6 · Wed Aug 10 2022 23:59:37 GMT+0800 (China Standard Time)

Hello, yes i had sadly misunderstood the usage of the attention module for CaraNet. I initially thought, that they implemented the code similar, but ended up not using the axial attention, misinterpreting the inital gamma value of the self attention layer and mixing up the implementaton with the reverse attention module. It is indeed correct, that the axial attention is used by CaraNet, unreferenced in the published paper. However i think they reference that fact in the official repository.

I am sry for disturbing you with that comment and wanted to delete it, once i realized my initial assumption to be wrong.

Taehun Kim · Answer 7 · Thu Sep 29 2022 11:21:44 GMT+0800 (China Standard Time)

I just noticed that the author of CaraNet updated their readme.