rinnakk / prefix-tuning-gpt

Example code for prefix-tuning GPT/GPT-NeoX models and for inference with trained prefixes

Home Page:https://huggingface.co/rinna

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Questions about Update the model weight and config files

3A732038 opened this issue · comments

Excuse me, when I use this prefix-tuning-gpt model, it can be executed.

But by 2023/3/20, I found that the Hugging Face website said that there was an updated model, which caused a warning to appear when I ran this code (as shown below) You are using a model of type gpt_neox to instantiate a model of type gpt-neox. This is not supported for all configurations of models and can yield errors.

I would like to ask if there is any way to solve it?

base_model = GPTNeoXForCausalLM.from_pretrained(config.pretrained_model_dir)

Hi @3A732038 , I have submitted a new commit bd6027b to make the code work with the updated model from Huggingface Model Hub. Please check the update log.

Thanks for your help.

Later, I directly applied the base_model to the GPTNeoXForCausalLM of the Huggingface Model Hub, and then changed some code, the error disappeared and the loss and ppl were normal.

I will try your updated project again, thanks again~

@ZHAOTING Sorry, I found that you used position_ids in line 73 of prefix_inference.py, but I saw that the forward function in GPTNeoXForCausalLM you quoted did not use the parameter position_ids, which caused a warning when it was executed, so I would like to ask if this is reasonable ?
image

Good point!
You are right. It is a legacy from the previous version of the gpt-neox modeling code.
Since the huggingface implementation of gpt-neox does not support explicitly providing position ids as inputs, you can safely remove the position ids from model inputs to get rid of the warning.

Thank you for your reply, I still have two questions:

  1. I looked at your model structure, and it seems that there is no layer that focuses on location information, so I would like to ask where the information that considers location is placed in the model?

  2. In addition, I would like to ask, if I want to change my pre-training model from gpt-neox to gpt, should I change this program
    GPTNeoXForCausalLM.from_pretrained('rinna/japanese-gpt-neox-small')
    replace with
    GPT2LMHeadModel.from_pretrained('gpt2')
    Then the tokenizer must also be switched to the corresponding tokenizer of the gpt2 model
    In this way, the problem of model mismatch will not occur, and the training effect will be better?

Because i tested your program if i choose gpt then execute the following program
GPT2LMHeadModel.from_pretrained('rinna/japanese-gpt-neox-small')
The following error occurs:
You are using a model of type gpt_neox to instantiate a model of type gpt2. This is not supported for all configurations of models and can yield errors.

  1. In gpt2, position embedding (called wpe) encodes location information. In gpt-neox, rotary embedding (search for keyword rotary) encodes it.
  2. You should change both the model class/name and tokenizer class/name. However, if you use AutoModelForCausalLM and AutoTokenizer, you can change only the model name and tokenizer name. For example
    model = AutoModelForCausalLM.from_pretrained("gpt2")
    tokenizer = AutoTokenizer.from_pretrained("gpt2")
    

So if I use AutoModelForCausalLM and AutoTokenizer can I use prefix_tuning directly?
Or do I just use the model name of the GPTNeoXForCausalLM class?

In addition, I tried to set the pretrained_model_dir to something else, such as EleutherAI/pythia-160m, but the code below the forward_step function will have nan values
model_outputs = model( input_ids=input_ids, past_key_values=past_key_values_prompt, return_dict=True )

So I would like to ask if there is any solution, thank you.