Question about AdaptFormer architecture

Question

Question about AdaptFormer architecture

wngh1187 opened this issue 2 years ago · comments

Hello. I am deeply grateful to you for releasing the code of your amazing research.
However, I confirmed that the structure of AdaptFormer was slightly different between Figure 2 (b) of the paper and the written code.
In the figure of the paper, trainable layers (AdaptMLP) are shown to feed data after the second Layer Norm.
On the other hand, the code (line 79 in models/custom_modules.py) of adaptmlp receives data after multi-head attention as an input.
Which of the two is correct?
Please let me know if I misunderstand.

Shoufa Chen · Answer 1 · Mon Jul 11 2022 14:15:32 GMT+0800 (China Standard Time)

Good catch.

The code is correct.
Thanks for pointing it out. We'll fix the figure in the next version.

Ju-ho Kim · Answer 2 · Mon Jul 11 2022 14:48:31 GMT+0800 (China Standard Time)

Thanks for the reply. :)

Tong Liang · Answer 3 · Tue May 23 2023 01:57:21 GMT+0800 (China Standard Time)

Does equation (3) in v3 of the paper on arXiv still refer to the old figure 2(b) in v1? It seems that the Layer Norm is still in equation (3)