Which pretrained models can be used with this codebase?

Question

Which pretrained models can be used with this codebase?

rtaori opened this issue 3 years ago · comments

First of all - thanks for this work! It is super nice and very helpful.

I would like to finetune a bunch of different pretrained model bases and collect results. The description in the README states
"roberta-large can also be replaced by bert-base, bert-large, roberta-base and distilbert-base". However, out of the four, only roberta-base seems to work out of the box. For the other three, I get some error that looks like this:

OSError: Can't load config for 'bert-large'. Make sure that:
- 'bert-large' is a correct model identifier listed on 'https://huggingface.co/models'

Which indeed it seems true that bert-large isn't listed on the site. So I tried the closest thing I could find to it, bert-large-cased, which gives the following error:

Traceback (most recent call last):
  File "run.py", line 623, in <module>
    main()
  File "run.py", line 476, in main
    resize_token_type_embeddings(model, new_num_types=10, random_segment=model_args.random_segment)
AttributeError: 'ModelArguments' object has no attribute 'random_segment'

(Same story holds for bert-base-cased, but distilbert-base-cased seems to work).

Do you have a list of a few models which I can simply plug in as a command line option and expect them to work? I am not particularly set on the list of models in this issue - any other pretrained models would work as well.

Thanks for the help in advance,
Rohan

Tianyu Gao · Answer 1 · Mon Mar 15 2021 10:32:25 GMT+0800 (China Standard Time)

Hi Rohan, thanks for your feedback! This random_segment thing is indeed a bug and I have just fixed it. Also,

roberta-large can also be replaced by bert-base, bert-large, roberta-base and distilbert-base

This description is for generating embeddings using SBERT instead of our main model. I just added one sentence describing pre-trained models you can directly use here:

Also, this codebase supports BERT-series and RoBERTa-series pre-trained models in Huggingface's transformers. You can check Huggingface's website for available models and pass models with a "bert" or "roberta" in their names to --model_name_or_path. Some examples would be bert-base-uncased, bert-large-uncased, roberta-base, roberta-large, etc.

Here are some examples: bert-base-uncased, bert-base-cased, bert-large-uncased, bert-large-cased, roberta-base, roberta-large.

Rohan Taori · Answer 2 · Wed Mar 17 2021 17:47:39 GMT+0800 (China Standard Time)

Hi @gaotianyu1350 ,

Thanks for fixing this! Indeed I can confirm that I am able to use bert-large-cased and successfully finetune with it now. I also came across another issue for zero-shot evaluation, but will close this issue and open another one.