attention-is-all-you-need attention-mechanism transformer-model

BETSI

A light implementation of the 2017 Google paper 'Attention is all you need'. BETSI is the name of the model, which is a recursive acronym standing for BETSI: English to shitty Italian, as the training time I allowed on my graphics card did not give enough time for amazing results.

For this implementation I will implement a translation from English to Italian, as Tranformer models are exceptional at language translation and this seems to be a common use of light implementations of this paper.

The dataset I will be using is the opus books dataset which is a collection of copyright free books. The book content of these translations are free for personal, educational, and research use. OPUS language resource paper.

Notes

I'm creating notes as I go, which can be found in NOTES.md.

Transformer model architecture

Requirements

There is a requirements.txt that has the packages needed to run this. I used PyTorch with ROCm as this sped up training A LOT. Training this model on CPU on my laptop takes around 5.5 hours per epoch, while training the model on GPU on my desktop takes around 13.5 minutes (24.4 times faster!).

TODO and tenative timeline:

References used

About

A light implementation of the 2017 Google paper 'Attention is all you need'.

attention-is-all-you-need attention-mechanism transformer-model

MIT License

Languages

Language:Jupyter Notebook 98.8%Language:Python 1.2%Language:Shell 0.0%