r0mainK / outperformer

Code for scaling Transformers

Home Page:https://keramitas.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Outperformer

Repository containing the implementations related to my blog post series on scaling Transformers:

Check those out for a detailed explanation of the code and additional information (like reference papers and related codebases).

For now, the codebase is split between 3 files:

  • implementation of fast attention in the fast_attention.py file
  • implementation of reversible layers in the reversible.py file
  • implementation of a headless Reformer + Performer model (a BERT-like MLM with the above modifications) in the performer.py file

If you have any questions (and couldn't find an answer in the post), feel free to open an issue !

Regarding contributions, bug reports (and fixes) are greatly appreciated - although I hope there won't be any :p I don't know yet in which direction this repository will go, whether it will stay as is or incorporate additional features, so if you have ideas please open an issue to talk about them ! Any new feature should be in the spirit of the existing code: aiming at scaling Transformer MLMs through architectural innovations.

If you end up contributing, please review the guidelines first.

All of this is released under the MIT License so feel free to use it as you wish :D

About

Code for scaling Transformers

https://keramitas.io

License:MIT License


Languages

Language:Python 100.0%