strip_accents should be None by default in WordPiece
sxjscience opened this issue · comments
Description
@leezu @szha @xinyual I noticed that we may need to set strip_accents
to None in
lowercase
is True.
This may impact the performance.
Error Message
(Paste the complete error message, including stack trace.)
To Reproduce
(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)
Steps to reproduce
(Paste the commands you ran that produced the error.)
What have you tried to solve it?
Environment
We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:
curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python
# paste outputs here
However, accents may have certain meanings for lots of languages, e.g., mochte vs. möchte. Thus, we may try to turn it off in nlp_process.
Thus, we may try to turn it off in nlp_process.
Do you mean exposing an option in nlp_process
or changing the defaults in nlp_process
? As English is a special case that doesn't care much about accents, I suggest we must keep the option to keep accents in nlp_process
.