adding specific tokens to vocabulary
xinyuwang1126 opened this issue · comments
Xinyu Wang commented
Hi Luowei,
Thanks for sharing this repo! I am trying to adapt it to a specific task. In that task, I wish to remain some tokens unsplit (thousands of tokens). Is there a way that I could do that? I am trying to add tokens to bert vocabulary file but didn't find the file. Thanks and look forward to your reply!
Luowei Zhou commented
@xinyuwang1126 You can change the vocab by replacing the default file with your customized vocab file. Then, you will need to modify the model config file and checkpoint (including both the .bin file and code) as well to map the old embeddings to your new vocab.
Xinyu Wang commented
got it, thank you!