CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to load and run inference on finetuned Alpaca model

doyled-it opened this issue · comments

🐛 Describe the bug

Overview

After training the declare-lab/flan-alpaca-gpt4-xl model for 1 step (to ensure that the saved model wasn't just straying too far from the original), if I load the model using huggingface APIs, in any way (see below), it produces an output similar to this, that is complete nonsense. The prompt to produce this output was "tell me a joke", just as a sanity test.

I've trained with and without saving the optimizer (according to #545)

Winner application négoci Român application Winner push pilot négoci construi négoci application papa Winner Widerstand application papa Winner Widerstand application papa Winner Widerstand application papa Winner Widerstand application papa Winner Widerstand application papa Winner Widerstand application Widerstand application papa Winner Widerstand application papa Winner Widerstand application Widerstand application Widerstand application Widerstand application Widerstand application Widerstand application Widerstand application Widerstand application Widerstand application Widerstand application Widerstandtec construi négoci construi négoci construi négoci construi négoci construi négoci construi négoci construi négoci construi négoci
...

The crazy part is during evaluation table outputs, I don't see any of these outputs. The outputs look as they should. But I can't load the model for inference and have it produce the same outputs.

Configs

llm_model = "declare-lab/flan-alpaca-gpt4-xl"
TRLConfig(
        train=TrainConfig(
            seq_length=2048,
            epochs=1_000,
            total_steps=1_000_000,
            batch_size=64,
            checkpoint_interval=250,
            eval_interval=1,
            pipeline="PromptPipeline",
            trainer="AcceleratePPOTrainer",
            save_optimizer="False",
            save_best=True,
            tracker="tensorboard",
            logging_dir=data.tensorboard_dir,
            checkpoint_dir=data.checkpoint_dir,
        ),
        model=ModelConfig(
            model_path=llm_model, model_arch_type="seq2seq", num_layers_unfrozen=2
        ),
        tokenizer=TokenizerConfig(tokenizer_path=llm_model, truncation_side="right"),
        optimizer=OptimizerConfig(
            name="adamw",
            kwargs=dict(lr=1.0e-5, betas=(0.9, 0.95), eps=1.0e-8, weight_decay=1.0e-6),
        ),
        scheduler=SchedulerConfig(
            name="cosine_annealing", kwargs=dict(T_max=10_000, eta_min=1.0e-5)
        ),
        method=PPOConfig(
            name="PPOConfig",
            num_rollouts=16,
            chunk_size=2,
            ppo_epochs=8,
            init_kl_coef=0.05,
            target=6,
            horizon=10_000,
            gamma=1,
            lam=0.95,
            cliprange=0.2,
            cliprange_value=0.2,
            vf_coef=1,
            scale_reward="ignored",
            ref_mean=None,
            ref_std=None,
            cliprange_reward=10,
            gen_kwargs=dict(
                max_new_tokens=40,
                top_k=0,
                top_p=1.0,
                do_sample=True,
            ),
        )
)

Ways I've tried to load the model for inference

  1. Pipeline
from transformers import pipeline

pipe = pipeline(
    "text2text-generation",
    model=str(path),
    tokenizer="declare-lab/flan-alpaca-gpt4-xl",
)

text = pipe("tell me a joke")
  1. T5
from transformers import T5ForConditionalGeneration

model = T5ForConditionalGeneration.from_pretrained(str(path))
tokenizer = AutoTokenizer.from_pretrained("declare-lab/flan-alpaca-gpt4-xl")
test_prompt = "tell me a joke"
input_ids = tokenizer(test_prompt, return_tensors="pt").input_ids
output = model.generate(input_ids, max_length=1000, do_sample=True)
  1. AutoModelForCausalLM
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(str(path))
tokenizer = AutoTokenizer.from_pretrained("declare-lab/flan-alpaca-gpt4-xl")
test_prompt = "tell me a joke"
input_ids = tokenizer(test_prompt, return_tensors="pt").input_ids
output = model.generate(input_ids, max_length=1000, do_sample=True)

All of these attempts produce a similar output, besides 3 because that errors when I try to load the model from_pretrained.

Which trlX version are you using?

0.7.0 2e667e6

Additional system and package information

Python 3.10.11, Transformers 4.30.1

commented

Thanks @doyled-it for opening this issue! Can confirm this is the case and will try to resolve it shortlly.

@maxreciprocate thanks for looking into it! I also forgot to mention that in order for it to load, I have to copy in the model's config.json into the checkpoint directory.

Thank you @maxreciprocate for merging that so quickly. I see that it does now in fact load when I point to the hf_model directory and give me an output that makes sense:

path = Path("path/to/best_checkpoint/hf_model")
pipe = pipeline(
    "text2text-generation",
    model=str(path),
    tokenizer="declare-lab/flan-alpaca-gpt4-xl",
)

text = pipe("tell me a joke")

print(text)

Output:

[{'generated_text': "Why don't scientists trust atoms? Because they make up everything!"}]

Just to make sure: With PR #551, trlx is now saving a trained version of the model with updated weights? And that's what I'm loading when I use the code above?

commented

@doyled-it that should be the case, yes

@doyled-it @maxreciprocate , This is very helpful.
In terms of using save_optimizer = false in my configs file, how should my trainer function be set up?

Initially, i did while using save_optimizer = true (Thats the default, i didnt add the variable to my configs file), my training file was as follows:

trainer = trlx.train(
config.model.model_path,
reward_fn=batch_reward_fn,
prompts=repeated_train_prompts,
eval_prompts=eval_prompts,
config=config,
)

Now that i am setting save_optimizer = false , will my training file remain the same or i have to add model.savepretrained("rlhf_trained _model")?. See the below:

trainer = trlx.train(
    config.model.model_path,
    reward_fn=batch_reward_fn,
    prompts=repeated_train_prompts,
    eval_prompts=eval_prompts,
    config=config,
)
model = trainer.model
model.save_pretrained("rlhf_trained_model")

Thank you

@promiseve

trainer = trlx.train(
    config.model.model_path,
    reward_fn=batch_reward_fn,
    prompts=repeated_train_prompts,
    eval_prompts=eval_prompts,
    config=config,
)
model = trainer.model
model.save_pretrained("rlhf_trained_model")

You don't need to run model.save_pretrained I believe. It will save the model(s) to the checkpoint_dir just like before, so long as you've set that variable in the TrainConfig. If I understand the question correctly.

Hi @doyled-it , it has been a while since this is a side project. I had to update to the current trlx version and now have my training pipeline. Unfortunately, when i run the scripts and include save_optimizer = False, i can't find where the model is stored. Previously, it was getting stored by default in "ckpts" or any path i specify as the checkpoint-directory in the confg.
Now none of those contain the saved file. Is the model saved to a different path by default after using save_optimizer=True?

Thanks