- 图解Transformer https://blog.csdn.net/longxinchen_ml/article/details/86533005
- Transformer中warm-up和LayerNorm的重要性探究 https://zhuanlan.zhihu.com/p/84614490
-
https://pytorch.org/tutorials/beginner/transformer_tutorial.html
-
Why do you transpose the input shape to (seq len, batch)? teddykoker/image-gpt#8