Which pretrained models can be used with this codebase?
rtaori opened this issue · comments
First of all - thanks for this work! It is super nice and very helpful.
I would like to finetune a bunch of different pretrained model bases and collect results. The description in the README states
"roberta-large
can also be replaced by bert-base
, bert-large
, roberta-base
and distilbert-base
". However, out of the four, only roberta-base
seems to work out of the box. For the other three, I get some error that looks like this:
OSError: Can't load config for 'bert-large'. Make sure that:
- 'bert-large' is a correct model identifier listed on 'https://huggingface.co/models'
Which indeed it seems true that bert-large
isn't listed on the site. So I tried the closest thing I could find to it, bert-large-cased
, which gives the following error:
Traceback (most recent call last):
File "run.py", line 623, in <module>
main()
File "run.py", line 476, in main
resize_token_type_embeddings(model, new_num_types=10, random_segment=model_args.random_segment)
AttributeError: 'ModelArguments' object has no attribute 'random_segment'
(Same story holds for bert-base-cased
, but distilbert-base-cased
seems to work).
Do you have a list of a few models which I can simply plug in as a command line option and expect them to work? I am not particularly set on the list of models in this issue - any other pretrained models would work as well.
Thanks for the help in advance,
Rohan
Hi Rohan, thanks for your feedback! This random_segment
thing is indeed a bug and I have just fixed it. Also,
roberta-large can also be replaced by bert-base, bert-large, roberta-base and distilbert-base
This description is for generating embeddings using SBERT instead of our main model. I just added one sentence describing pre-trained models you can directly use here:
Also, this codebase supports BERT-series and RoBERTa-series pre-trained models in Huggingface's
transformers
. You can check Huggingface's website for available models and pass models with a "bert" or "roberta" in their names to--model_name_or_path
. Some examples would bebert-base-uncased
,bert-large-uncased
,roberta-base
,roberta-large
, etc.
Here are some examples: bert-base-uncased
, bert-base-cased
, bert-large-uncased
, bert-large-cased
, roberta-base
, roberta-large
.
Hi @gaotianyu1350 ,
Thanks for fixing this! Indeed I can confirm that I am able to use bert-large-cased
and successfully finetune with it now. I also came across another issue for zero-shot evaluation, but will close this issue and open another one.