dmlc / gluon-nlp

NLP made easy

Home Page:https://nlp.gluon.ai/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

nlp.models.bert.get_pretrained_bert provides slow tokenizer

leezu opened this issue · comments

LegacyHuggingFaceTokenizer instead of https://github.com/huggingface/tokenizers

I think both are using tokenizers but the legacy one is following the API of an older version of HF tokenizers.

The transformers.PreTrainedTokenizer is "Base class for all slow tokenizers." compared to transformers.PreTrainedTokenizerFast which is "Base class for all fast tokenizers (wrapping HuggingFace tokenizers library)." so only the latter uses HF tokenizers

We are calling the tokenizers package directly in the implementation.