avain / 2023-transformers-rotf

Codes for the paper "A mathematical perspective on Transformers".

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A mathematical perspective on Transformers

Python codes for the paper A mathematical perspective on Transformers by Borjan Geshkovski, Cyril Letrouit, Yury Polyanskiy, and Philippe Rigollet.

animated animated

Abstract

Transformers play a central role in the inner workings of large language models. We develop a mathematical framework for analyzing Transformers based on their interpretation as interacting particle systems, which reveals that clusters emerge in long time. Our study explores the underlying theory and offers new perspectives for mathematicians as well as computer scientists.

Citing

@article{geshkovski2023perspective,
      title={A mathematical perspective on Transformers}, 
      author={Borjan Geshkovski and Cyril Letrouit and Yury Polyanskiy and Philippe Rigollet},
      year={2023},
      eprint={2312.10794},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

About

Codes for the paper "A mathematical perspective on Transformers".


Languages

Language:Python 54.8%Language:Jupyter Notebook 45.2%