nlp.models.bert.get_pretrained_bert provides slow tokenizer

Question

nlp.models.bert.get_pretrained_bert provides slow tokenizer

leezu opened this issue 3 years ago · comments

Leonard Lausen commented 3 years ago

LegacyHuggingFaceTokenizer instead of https://github.com/huggingface/tokenizers

Xingjian Shi · Answer 1 · Tue Mar 23 2021 23:41:33 GMT+0800 (China Standard Time)

I think both are using tokenizers but the legacy one is following the API of an older version of HF tokenizers.

Leonard Lausen · Answer 2 · Wed Mar 24 2021 07:25:53 GMT+0800 (China Standard Time)

The transformers.PreTrainedTokenizer is "Base class for all slow tokenizers." compared to transformers.PreTrainedTokenizerFast which is "Base class for all fast tokenizers (wrapping HuggingFace tokenizers library)." so only the latter uses HF tokenizers

Xingjian Shi · Answer 3 · Wed Mar 24 2021 07:27:25 GMT+0800 (China Standard Time)

We are calling the tokenizers package directly in the implementation.