lucidrains / vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why Remove PreNorm?

tonyyunyang opened this issue · comments

commented

May I ask why the PreNorm is removed? I am very curious about the reason.
image
As the Transformer encoder is different from the architecture as shown below, which is from the original paper.
image