Ban some tokens
AnnaKholkina opened this issue · comments
Hello. I would like the model to not use some tokens (such as \n
). When training a model, can I remove unnecessary tokens from the tokenizer and how to do this? And how will the removal of most tokens affect the quality of training? (let's say I want to train a model to speak only one language).
Thanks for your answers!