mit-han-lab / lite-transformer

[ICLR 2020] Lite Transformer with Long-Short Range Attention

Home Page:https://arxiv.org/abs/2004.11886

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

about the global and local features in fig 3

sanwei111 opened this issue · comments

as we know,the conventional attention module can capture features like fig 3.b(including diagonal and other positions). THIS ability is its nature,BUT i JUST wonder that when we add a branch that can capture local features,the attention module can not capture feature like before,i.g,(including diagonal and other positions),while it just capture global feature!!!

Same question, how the model make sure that the attention layers capture the global information and the CNN layers capture local information with only one NLL loss? Have you figure it out?

as we know,the conventional attention module can capture features like fig 3.b(including diagonal and other positions). THIS ability is its nature,BUT i JUST wonder that when we add a branch that can capture local features,the attention module can not capture feature like before,i.g,(including diagonal and other positions),while it just capture global feature!!!

Thank you for your interest in our project. Unfortunately, this repository is no longer actively maintained, so we will be closing this issue. If you have any further questions, please feel free to email us. Thank you again!