dev1ashish / Attention-was-all-i-needed

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Python PyTorch AI/ML NLP Transformers

Language Translation using transformers

This repository contains an implementation (not completed yet due to me being compute poor) of the Transformer model as described in the paper "Attention Is All You Need" by Vaswani et al. The Transformer model is a novel architecture for handling sequential data, particularly in the field of natural language processing (NLP). It introduces the concept of self-attention, allowing the model to weigh the importance of different parts of the input sequence when generating the output.

Transformer Architecture Overview

The Transformer model consists of an encoder-decoder structure, with each part consisting of multiple layers. The encoder processes the input sequence and maps it to a sequence of continuous representations. These representations are then fed into the decoder, which generates the output sequence. The model is auto-regressive, meaning it consumes previously generated symbols as additional input when generating the next symbol.

Key Components

  • Encoder: Transforms the input sequence into a sequence of continuous representations.
  • Decoder: Generates the output sequence based on the encoder's output and the previously generated symbols.
  • Self-Attention Mechanism: Allows the model to focus on different parts of the input sequence when generating the output.
  • Positional Encoding: Adds information about the position of tokens in the sequence, as the Transformer does not inherently understand the order of the sequence.

Paper

For a deeper understanding of the Transformer model and its components, refer to the following resources:

About


Languages

Language:Python 59.6%Language:Jupyter Notebook 40.4%