pvalle6 / Tokenizer_and_Bigram

This is my simple and readable implementation of the Byte Pair Encoding Algorithm and a Bigram Model.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This is a simple implementation of a Byte Pair Encoding Algorithm, used to tokenize text, and a Bigram Word Model.

These were created as part of my research work in Language Models and are my original implementation!

https://www.linkedin.com/in/peter-v-334609211/

About

This is my simple and readable implementation of the Byte Pair Encoding Algorithm and a Bigram Model.

License:GNU General Public License v3.0


Languages

Language:Python 100.0%