AugustasMacijauskas / trailtoken

An application that visualises LLM tokenizers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

trailtoken

An tool to visualise open source LLM tokenizers.

Usage

To launch the frontend locally, run the following commands:

cd frontend/
npm install
npm run dev

and you should be able to access the website at http://localhost:3000/trailtoken

To run the backend locally, execute:

cd backend/
pip install -r requirements. txt
python src/main.py

and you should be able to make requests to http://127.0.0.1:5000/. In particular, a request to tokenizer text can be made to http://127.0.0.1:5000/tokenize. The body of the request has the following structure:

{
  "tokenizer_name": string,
  "input_text": string
}

Other useful frontend commands are

npm run lint  # for linting
npm run build # build the site
npm run start # run the built site

Tests

Check backend test coverage with

pytest --cov-report=term-missing:skip-covered --cov=src/

Acknowledgements

Inspired by Andrej Karpathy's video on tokenization and a similar tool for visualising OpenAI tokenizers.

Cite

You can cite this work by using the following

@misc{trailtoken2024,
  author = {Lopata, Laurynas and Macijauskas, Augustas},
  title = {trailtoken: {O}pen {S}ource {LLM} {T}okenizer {V}isualisation {T}ool},
  year = {2024},
  howpublished = {\url{https://augustasmacijauskas.github.io/trailtoken/}},
  note = {Accessed: 2024-04-18}
}

About

An application that visualises LLM tokenizers


Languages

Language:TypeScript 50.4%Language:Python 40.1%Language:JavaScript 9.4%Language:CSS 0.1%