mlfoundations / open_lm

A repository for research on medium sized language models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tokenization on-the-fly without slowdown

sagadre opened this issue · comments

Benchmark and get tokenization on-the-fly to be as fast as training on pre-tokenized data