stanford-crfm / mistral

Mistral: A strong, northwesterly wind: Framework for transparent and accessible large-scale language model training, built with Hugging Face 🤗 Transformers.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add EOS token when concatenating documents in preprocessing loop

siddk opened this issue · comments

As per #90, we currently do not add an EOS separator between documents. We should do this, to facilitate unprompted generation for the future.

In the process, we should also probably add some strict tests checking preprocessing invariants like this, amongst other.

i missed this issue. i propose we punt for now, but it's easy to fix if we want? cc @J38