juand-r / tiny_tokenizer

A word-level tokenizer for TinyStories data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tiny_tokenizer

A word-level tokenizer for TinyStories data

Made with help and thoughts from https://github.com/tdooms, Dan Braun, Juan Diego Rodriguez, and Mat Allen.

About

A word-level tokenizer for TinyStories data

License:MIT License


Languages

Language:Python 64.3%Language:Jupyter Notebook 35.7%