custom tokenizer and text encoder

Question

sinjohr opened this issue 3 years ago · comments

I want to use custom tokenizer and encoder trained from huggingface tokenizer.

After training the huggingface tokenizer, I got a json containing vocas.

However, I don't know how to feed this custom tokenizer with train_finetune.py.

Could you give some guide to set and use custom tokenizer?

tonyhuang33 · Answer 1 · Wed Jul 06 2022 16:34:44 GMT+0800 (China Standard Time)

My problem is the same as yours. Please reply me if you solve it. Thank you