lucidrains / attention

This repository will house a visualization that will attempt to convey instant enlightenment of how Attention works to someone not working in artificial intelligence, with 3Blue1Brown as inspiration

Home Page:https://lucidrains.github.io/attention

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dive into Deep Learning, redone by Quanta Magazine

Attention (wip)

This repository will house a visualization that will attempt to convey instant enlightenment of how Attention works, in the field of artificial intelligence. Obviously I believe this algorithm to be one of the most important developments in the history of deep learning. We can possibly use it to solve, well, everything.

In my mind, one good intuitive visualization can bring about more insight and understanding than long highly paid tutoring / courses.

Why does it work?

Attention has many interpretations, ranging from physics based intepretations to speculations on biological plausibility.

Update: Recently, three papers have concurrently closed in on a connection between self-attention and gradient descent, while investigating in-context learning properties of Transformers!

  1. Transformers learn in-context by gradient descent
  2. What learning algorithm is in-context learning? Investigations with linear models
  3. Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers

What has Attention accomplished?

Will keep adding to this list as time goes on

Other resources

Is it all we need?

No one really knows. All I know is, if we were to dethrone attention with a better algorithm, it is over. Part of what motivates me to do some scalable 21st century teaching is the hope maybe someone can find a way to improve on it, or find its replacement. It just takes one discovery!

Potential improvements

Appreciation

Large thanks goes to 3Blue1Brown for showing us that complex mathematics can be taught with such elegance and potency through visualizations

Citations

@misc{vaswani2017attention,
    title   = {Attention Is All You Need},
    author  = {Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin},
    year    = {2017},
    eprint  = {1706.03762},
    archivePrefix = {arXiv},
    primaryClass = {cs.CL}
}
@article{Bahdanau2015NeuralMT,
    title   = {Neural Machine Translation by Jointly Learning to Align and Translate},
    author  = {Dzmitry Bahdanau and Kyunghyun Cho and Yoshua Bengio},
    journal = {CoRR},
    year    = {2015},
    volume  = {abs/1409.0473}
}

Gotta teach the AGI to love. - Ilya Sutskever

About

This repository will house a visualization that will attempt to convey instant enlightenment of how Attention works to someone not working in artificial intelligence, with 3Blue1Brown as inspiration

https://lucidrains.github.io/attention

License:MIT License


Languages

Language:HTML 100.0%