mlfoundations / open_lm

A repository for research on medium sized language models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support user specified token pre-processing functions

sagadre opened this issue · comments

Often we may have special control tokens that need to be handle when creating the inputs and targets. To allow max flexibility, users should be able to provide their own sample_chunk functions or similar