zrxbeijing/Explaining_Transformers

transformer tutorial

The paper "Attention is all you need" by Vaswani et. al introduced a very powerful language model framework -Transformers- in 2017. Devlin et al. from Google adopted this methodology to fulfill unsupervised language tasks and trained a ground-breaking language model which opens a new era for NLP.

Given the popularity of Transformers, understanding its architecture is not a trivial matter. The great tutorial "The annotated Transformer" by Harvard NLP makes things much easier. However, when I read the tutorial, I realized there could be a more detailed way to present the whole structure of Transformers. Therefore, I came up with the idea of recording how I understand every detail of this Transformer architecture. Compared with "The annotated Transformer", this note looks deeper into some key classes (such as the "MultiHeadAttention" class) or methods and provides much friendly and verbose explanation for those who find it somewhat unclear when reading other tutorials prepared for more experienced practitioners.

About

This is a step by step tutorial explaining the Transformer architecture. It is based on the great tutorial from Harvard NLP.

transformer tutorial