Why Remove PreNorm?

Question

Why Remove PreNorm?

tonyyunyang opened this issue 4 months ago · comments

May I ask why the PreNorm is removed? I am very curious about the reason.

As the Transformer encoder is different from the architecture as shown below, which is from the original paper.