leaderj1001 / Transformer

Implementing Attention Is All You Need paper. Transformer Model

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Attention Is All You Need - Transformer

Attention Is All You Need paper

Network Architecture

캡처

  • Training:

    • At training, all sentences except the last word of the target sentence are given as output.
  • Evaluation:

    • At the time of evaluation, output's input is start " token", and output's input is added every time a word comes out. Then " token" appears or translate the sentence up to max_len.
  • Example:

    • Example Sentence: Several women wait outside in a city. (English) -> Mehrere Frauen warten in einer Stadt im Freien. (German)
    • Training:
      • Source sentence: Several women wait outside in a city.
      • Output's input: Mehrere Frauen warten in einer Stadt im
      • Target sentence: Mehrere Frauen warten in einer Stadt im Freien.
    • Evaluation:
      • Source sentence: Several women wait outside in a city.
      • Output's input: CodeCogsEqn (3)

Positional Encoding

positionalEncoding

  • Input data shape: (batch_size, max_len)
  • Input Embedding output shape: (batch_size, max_len, embedding_dim)
  • Positional Encoding
    • Positional encoding method that sets the position of each word and embedding dimention regardless of input sentence
    • Positional encoding is performed for each sentence length.
  • Formula
    • CodeCogsEqn
    • CodeCogsEqn (1)

Scaled Dot-Product Attention

scaled dot-product

  • Matrix multiplication and softmax of query and key, we can see how each word affects other words.

Multi-Head Attention

Multi-head

Feed Forward

feedforward

Add & Norm

Masking

  • Encoder
    • I also used masking in the encoder section. Because the sentence is less than max_len, = 1 is inserted, so I masked it.
  • Decoder
    • It masks the word because it can not predict the current word by looking at the future word.

Todo

  • example of execute
  • code example
  • code refactoring & translate code

About

Implementing Attention Is All You Need paper. Transformer Model


Languages

Language:Python 100.0%