Can I use `rinna/japanese-roberta-base` through `AutoTokenizer` ?

Question

Can I use `rinna/japanese-roberta-base` through `AutoTokenizer` ?

shunk031 opened this issue 3 years ago · comments

Hi, thank you very much for publishing such a wonderful Japanese pre-trained model! I am very happy to use this model.

I would like to load the pre-trained tokenizer from AutoTokenizer.from_pretrained, but I encountered the following error. Do you support loading the pre-trained tokenizer from AutoTokenizer.from_pretrained ?

$ python -c "from transformers import AutoTokenizer; AutoTokenizer.from_pretrained('rinna/japanese-roberta-base')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/shunk031/.pyenv/versions/japanese-dev/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 423, in from_pretrained
    return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/home/shunk031/.pyenv/versions/japanese-dev/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1709, in from_pretrained
    return cls._from_pretrained(
  File "/home/shunk031/.pyenv/versions/japanese-dev/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1722, in _from_pretrained
    slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(
  File "/home/shunk031/.pyenv/versions/japanese-dev/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1781, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/shunk031/.pyenv/versions/japanese-dev/lib/python3.8/site-packages/transformers/models/roberta/tokenization_roberta.py", line 159, in __init__
    super().__init__(
  File "/home/shunk031/.pyenv/versions/japanese-dev/lib/python3.8/site-packages/transformers/models/gpt2/tokenization_gpt2.py", line 179, in __init__
    with open(vocab_file, encoding="utf-8") as vocab_handle:
TypeError: expected str, bytes or os.PathLike object, not NoneType

Fortunately, AutoModel.from_pretrained can be run successfully (the warning message can be ignored this time).

$ python -c "from transformers import AutoModel; AutoModel.from_pretrained('rinna/japanese-roberta-base')"
Some weights of RobertaModel were not initialized from the model checkpoint at rinna/japanese-roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

The following is my system environment:

python 3.8.8
transformers 4.5.1

I would appreciate any advice on how to load it this way. Thanks.

Shunsuke KITADA · Answer 1 · Fri Sep 17 2021 14:17:10 GMT+0800 (China Standard Time)

The above problem was caused by an old version of transformers. After updating to transformers 4.9.2, the above problem was resolved. Thanks for the great project!