Questions about Update the model weight and config files
3A732038 opened this issue · comments
Excuse me, when I use this prefix-tuning-gpt model, it can be executed.
But by 2023/3/20, I found that the Hugging Face website said that there was an updated model, which caused a warning to appear when I ran this code (as shown below) You are using a model of type gpt_neox to instantiate a model of type gpt-neox. This is not supported for all configurations of models and can yield errors.
I would like to ask if there is any way to solve it?
base_model = GPTNeoXForCausalLM.from_pretrained(config.pretrained_model_dir)
Thanks for your help.
Later, I directly applied the base_model to the GPTNeoXForCausalLM of the Huggingface Model Hub, and then changed some code, the error disappeared and the loss and ppl were normal.
I will try your updated project again, thanks again~
@ZHAOTING Sorry, I found that you used position_ids in line 73 of prefix_inference.py, but I saw that the forward function in GPTNeoXForCausalLM you quoted did not use the parameter position_ids, which caused a warning when it was executed, so I would like to ask if this is reasonable ?
Good point!
You are right. It is a legacy from the previous version of the gpt-neox modeling code.
Since the huggingface implementation of gpt-neox does not support explicitly providing position ids as inputs, you can safely remove the position ids from model inputs to get rid of the warning.
Thank you for your reply, I still have two questions:
-
I looked at your model structure, and it seems that there is no layer that focuses on location information, so I would like to ask where the information that considers location is placed in the model?
-
In addition, I would like to ask, if I want to change my pre-training model from gpt-neox to gpt, should I change this program
GPTNeoXForCausalLM.from_pretrained('rinna/japanese-gpt-neox-small')
replace with
GPT2LMHeadModel.from_pretrained('gpt2')
Then the tokenizer must also be switched to the corresponding tokenizer of the gpt2 model
In this way, the problem of model mismatch will not occur, and the training effect will be better?
Because i tested your program if i choose gpt then execute the following program
GPT2LMHeadModel.from_pretrained('rinna/japanese-gpt-neox-small')
The following error occurs:
You are using a model of type gpt_neox to instantiate a model of type gpt2. This is not supported for all configurations of models and can yield errors.
- In gpt2, position embedding (called
wpe
) encodes location information. In gpt-neox, rotary embedding (search for keywordrotary
) encodes it. - You should change both the model class/name and tokenizer class/name. However, if you use
AutoModelForCausalLM
andAutoTokenizer
, you can change only the model name and tokenizer name. For examplemodel = AutoModelForCausalLM.from_pretrained("gpt2") tokenizer = AutoTokenizer.from_pretrained("gpt2")
So if I use AutoModelForCausalLM and AutoTokenizer can I use prefix_tuning directly?
Or do I just use the model name of the GPTNeoXForCausalLM class?
In addition, I tried to set the pretrained_model_dir to something else, such as EleutherAI/pythia-160m, but the code below the forward_step function will have nan values
model_outputs = model( input_ids=input_ids, past_key_values=past_key_values_prompt, return_dict=True )
So I would like to ask if there is any solution, thank you.