Can I use `rinna/japanese-roberta-base` through `AutoTokenizer` ?
shunk031 opened this issue · comments
Hi, thank you very much for publishing such a wonderful Japanese pre-trained model! I am very happy to use this model.
I would like to load the pre-trained tokenizer from AutoTokenizer.from_pretrained
, but I encountered the following error. Do you support loading the pre-trained tokenizer from AutoTokenizer.from_pretrained
?
$ python -c "from transformers import AutoTokenizer; AutoTokenizer.from_pretrained('rinna/japanese-roberta-base')"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/shunk031/.pyenv/versions/japanese-dev/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 423, in from_pretrained
return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/home/shunk031/.pyenv/versions/japanese-dev/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1709, in from_pretrained
return cls._from_pretrained(
File "/home/shunk031/.pyenv/versions/japanese-dev/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1722, in _from_pretrained
slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(
File "/home/shunk031/.pyenv/versions/japanese-dev/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1781, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/shunk031/.pyenv/versions/japanese-dev/lib/python3.8/site-packages/transformers/models/roberta/tokenization_roberta.py", line 159, in __init__
super().__init__(
File "/home/shunk031/.pyenv/versions/japanese-dev/lib/python3.8/site-packages/transformers/models/gpt2/tokenization_gpt2.py", line 179, in __init__
with open(vocab_file, encoding="utf-8") as vocab_handle:
TypeError: expected str, bytes or os.PathLike object, not NoneType
Fortunately, AutoModel.from_pretrained
can be run successfully (the warning message can be ignored this time).
$ python -c "from transformers import AutoModel; AutoModel.from_pretrained('rinna/japanese-roberta-base')"
Some weights of RobertaModel were not initialized from the model checkpoint at rinna/japanese-roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
The following is my system environment:
- python 3.8.8
- transformers 4.5.1
I would appreciate any advice on how to load it this way. Thanks.
The above problem was caused by an old version of transformers. After updating to transformers 4.9.2, the above problem was resolved. Thanks for the great project!