ShoufaChen / AdaptFormer

[NeurIPS 2022] Implementation of "AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition"

Home Page:https://arxiv.org/abs/2205.13535

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about AdaptFormer architecture

wngh1187 opened this issue · comments

Hello. I am deeply grateful to you for releasing the code of your amazing research.
However, I confirmed that the structure of AdaptFormer was slightly different between Figure 2 (b) of the paper and the written code.
In the figure of the paper, trainable layers (AdaptMLP) are shown to feed data after the second Layer Norm.
On the other hand, the code (line 79 in models/custom_modules.py) of adaptmlp receives data after multi-head attention as an input.
Which of the two is correct?
Please let me know if I misunderstand.

Good catch.

The code is correct.
Thanks for pointing it out. We'll fix the figure in the next version.

Thanks for the reply. :)

Does equation (3) in v3 of the paper on arXiv still refer to the old figure 2(b) in v1? It seems that the Layer Norm is still in equation (3)
image