LuoweiZhou / VLP

Vision-Language Pre-training for Image Captioning and Question Answering

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

adding specific tokens to vocabulary

xinyuwang1126 opened this issue · comments

Hi Luowei,

Thanks for sharing this repo! I am trying to adapt it to a specific task. In that task, I wish to remain some tokens unsplit (thousands of tokens). Is there a way that I could do that? I am trying to add tokens to bert vocabulary file but didn't find the file. Thanks and look forward to your reply!

@xinyuwang1126 You can change the vocab by replacing the default file with your customized vocab file. Then, you will need to modify the model config file and checkpoint (including both the .bin file and code) as well to map the old embeddings to your new vocab.

got it, thank you!