CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to load the trained model to do the inference

CSerxy opened this issue · comments

After PPO training with Llama and store the best model into path ../ckpts/best_checkpoint/

I am not able to load the trained model to do the inference using below suggested code in other issue:

    import torch
    from trlx.models.modeling_ppo import AutoModelForCausalLMWithHydraValueHead
    model = AutoModelForCausalLMWithHydraValueHead.from_pretrained(path)

which is suggested by @glerzing in #480 (comment) and #365 @PhungVanDuy

The returned error is: OSError: ../ckpts/best_checkpoint does not appear to have a file named config.json. Checkout 'https://huggingface.co/../test_1shot-qd_0shot_davinci003-ag_init_kl_0.3/best_checkpoint/None' for available files.

I searched in this repo and didn't find a document guiding us to do the inference after training. Hope this feature will be added soon.

I want to ask is there a convenient way to transform this checkpoint to a hugging face checkpoint without retraining? And also, if we can directly load this checkpoint that's perfect.

More information, below files&folder are under best_checkpoint:

image

And under pyTorch_model folder, below files are there:
image

More information, below files&folder are under best_checkpoint:

image And under pyTorch_model folder, below files are there: image

Cause of this problem because of we are supporting two different ways to save the model: huggingface, and save with optimizer. You can check the logic of implementation here. In case that you only want to save under huggingface format you can set: save_optimizer = False under TrainingConfig.
In this case your case you can load your model by this way:

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('/hf/url/to/your/base/model') # e.g. NousResearch/Llama-2-7b-hf
state_dict = torch.load("./pytorch_model/mp_rank_00_model_states.pt")['module']
del state_dict['v_head.0.weight'], state_dict['v_head.0.bias'], state_dict['v_head.2.weight'], state_dict['v_head.2.bias']
model.load_state_dict(state_dict)

Hope this help you to solve the problem.

Thanks for the quick reply! @PhungVanDuy

I see. Do you mean that if I want to only save the hugginface format model, we just need to save save_optimizer=False during the training?

So the suggested you show above is the way if we already had a model in non-huggingface format. The above code can help us load it?

Moreover, are we supposed to replace '/hf/url/to/your/base/model' with 'ckpts/best_checkpoint' if we stored it in this path?

Hi @PhungVanDuy ,

I am still not able to load the trained model. For example, an error was hinted when I try "state_dict = torch.load("./pytorch_model/mp_rank_00_model_states.pt")['module']"

And I couldn't find the file "./pytorch_model/mp_rank_00_model_states.pt" under the trained model's folder. Can you help me with that?

Best

Hi @PhungVanDuy ,

I am still not able to load the trained model. For example, an error was hinted when I try "state_dict = torch.load("./pytorch_model/mp_rank_00_model_states.pt")['module']"

And I couldn't find the file "./pytorch_model/mp_rank_00_model_states.pt" under the trained model's folder. Can you help me with that?

Best

Sorry for late response can you, I saw in the image that you post its containing mp_rank_00_model_states.pt, can you check this file?

Thanks for the reply!

Yeah, but the above screenshot is the model trained by save_optimizer = True. So you mean the code you showed is the case where I can load the model using save_optimizer = True.

When I trained with save_optimizer = False, I didn't find this file. So in this case, I can load the model as the normal huggingface model?

commented

@CSerxy yes! if you had used save_optimizer=False you should be able to load the model in hf format with AutoModelForCausalLM.from_pretrained("ckpts/best_checkpoint")

Yes, I am able to do it. Many thanks! I think we can close this issue.

@CSerxy @PhungVanDuy @maxreciprocate , This is very helpful.
In terms of using save_optimizer = false in my configs file, how should my trainer function be set up?

Initially, i did while using save_optimizer = true (Thats the default, i didnt addthe variable to my configs file), my training file was as follows:
trainer = trlx.train(
config.model.model_path,
reward_fn=batch_reward_fn,
prompts=repeated_train_prompts,
eval_prompts=eval_prompts,
config=config,
)

Now that i am setting save_optimizer = false , will my training file remain the same or i have to add model.savepretrained("rlhf_trained _model")?. See the below:

trainer = trlx.train(
    config.model.model_path,
    reward_fn=batch_reward_fn,
    prompts=repeated_train_prompts,
    eval_prompts=eval_prompts,
    config=config,
)
model = trainer.model
model.save_pretrained("rlhf_trained_model")

Thank you