Ban some tokens

Question

Ban some tokens

AnnaKholkina opened this issue 8 months ago · comments

Hello. I would like the model to not use some tokens (such as \n). When training a model, can I remove unnecessary tokens from the tokenizer and how to do this? And how will the removal of most tokens affect the quality of training? (let's say I want to train a model to speak only one language).

Thanks for your answers!