Speed up tokenize.
emfomy opened this issue · comments
Mu Yang commented
HuggingFace's tokenizer can also return the original indices.
We may rewrite the tokenization step using this feature instead of tokenizing character by character.
Mu Yang commented
Use tokenizer without calling tokenize
(convert to ID character by character).
Mu Yang commented
Implemented in v0.2.0