Question about AdaptFormer architecture
wngh1187 opened this issue · comments
Hello. I am deeply grateful to you for releasing the code of your amazing research.
However, I confirmed that the structure of AdaptFormer was slightly different between Figure 2 (b) of the paper and the written code.
In the figure of the paper, trainable layers (AdaptMLP) are shown to feed data after the second Layer Norm.
On the other hand, the code (line 79 in models/custom_modules.py) of adaptmlp receives data after multi-head attention as an input.
Which of the two is correct?
Please let me know if I misunderstand.
Good catch.
The code is correct.
Thanks for pointing it out. We'll fix the figure in the next version.
Thanks for the reply. :)