Questions about Update the model weight and config files

Question

Questions about Update the model weight and config files

3A732038 opened this issue a year ago · comments

Excuse me, when I use this prefix-tuning-gpt model, it can be executed.

But by 2023/3/20, I found that the Hugging Face website said that there was an updated model, which caused a warning to appear when I ran this code (as shown below) You are using a model of type gpt_neox to instantiate a model of type gpt-neox. This is not supported for all configurations of models and can yield errors.

I would like to ask if there is any way to solve it?

base_model = GPTNeoXForCausalLM.from_pretrained(config.pretrained_model_dir)

Zhao Tianyu · Answer 1 · Wed Mar 22 2023 13:16:05 GMT+0800 (China Standard Time)

Hi @3A732038 , I have submitted a new commit bd6027b to make the code work with the updated model from Huggingface Model Hub. Please check the update log.

巫宇哲 · Answer 2 · Wed Mar 22 2023 22:27:08 GMT+0800 (China Standard Time)

Thanks for your help.

Later, I directly applied the base_model to the GPTNeoXForCausalLM of the Huggingface Model Hub, and then changed some code, the error disappeared and the loss and ppl were normal.

I will try your updated project again, thanks again~

巫宇哲 · Answer 3 · Thu Mar 30 2023 20:33:00 GMT+0800 (China Standard Time)

@ZHAOTING Sorry, I found that you used position_ids in line 73 of prefix_inference.py, but I saw that the forward function in GPTNeoXForCausalLM you quoted did not use the parameter position_ids, which caused a warning when it was executed, so I would like to ask if this is reasonable ?

Zhao Tianyu · Answer 4 · Thu Mar 30 2023 21:26:15 GMT+0800 (China Standard Time)

Good point!
You are right. It is a legacy from the previous version of the gpt-neox modeling code.
Since the huggingface implementation of gpt-neox does not support explicitly providing position ids as inputs, you can safely remove the position ids from model inputs to get rid of the warning.

巫宇哲 · Answer 5 · Sat Apr 01 2023 20:30:58 GMT+0800 (China Standard Time)

Thank you for your reply, I still have two questions:

I looked at your model structure, and it seems that there is no layer that focuses on location information, so I would like to ask where the information that considers location is placed in the model?
In addition, I would like to ask, if I want to change my pre-training model from gpt-neox to gpt, should I change this program
GPTNeoXForCausalLM.from_pretrained('rinna/japanese-gpt-neox-small')
replace with
GPT2LMHeadModel.from_pretrained('gpt2')
Then the tokenizer must also be switched to the corresponding tokenizer of the gpt2 model
In this way, the problem of model mismatch will not occur, and the training effect will be better?

Because i tested your program if i choose gpt then execute the following program
GPT2LMHeadModel.from_pretrained('rinna/japanese-gpt-neox-small')
The following error occurs:
You are using a model of type gpt_neox to instantiate a model of type gpt2. This is not supported for all configurations of models and can yield errors.

Zhao Tianyu · Answer 6 · Mon Apr 03 2023 10:16:33 GMT+0800 (China Standard Time)

In gpt2, position embedding (called wpe) encodes location information. In gpt-neox, rotary embedding (search for keyword rotary) encodes it.
You should change both the model class/name and tokenizer class/name. However, if you use AutoModelForCausalLM and AutoTokenizer, you can change only the model name and tokenizer name. For example
```
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
```

巫宇哲 · Answer 7 · Sun Apr 09 2023 16:09:25 GMT+0800 (China Standard Time)

So if I use AutoModelForCausalLM and AutoTokenizer can I use prefix_tuning directly?
Or do I just use the model name of the GPTNeoXForCausalLM class?

In addition, I tried to set the pretrained_model_dir to something else, such as EleutherAI/pythia-160m, but the code below the forward_step function will have nan values
model_outputs = model( input_ids=input_ids, past_key_values=past_key_values_prompt, return_dict=True )

So I would like to ask if there is any solution, thank you.