Unable to load and run inference on finetuned Alpaca model

Question

Unable to load and run inference on finetuned Alpaca model

doyled-it opened this issue 9 months ago · comments

🐛 Describe the bug

Overview

After training the declare-lab/flan-alpaca-gpt4-xl model for 1 step (to ensure that the saved model wasn't just straying too far from the original), if I load the model using huggingface APIs, in any way (see below), it produces an output similar to this, that is complete nonsense. The prompt to produce this output was "tell me a joke", just as a sanity test.

I've trained with and without saving the optimizer (according to #545)

Winner application négoci Român application Winner push pilot négoci construi négoci application papa Winner Widerstand application papa Winner Widerstand application papa Winner Widerstand application papa Winner Widerstand application papa Winner Widerstand application papa Winner Widerstand application Widerstand application papa Winner Widerstand application papa Winner Widerstand application Widerstand application Widerstand application Widerstand application Widerstand application Widerstand application Widerstand application Widerstand application Widerstand application Widerstand application Widerstandtec construi négoci construi négoci construi négoci construi négoci construi négoci construi négoci construi négoci construi négoci
...

The crazy part is during evaluation table outputs, I don't see any of these outputs. The outputs look as they should. But I can't load the model for inference and have it produce the same outputs.

Configs

llm_model = "declare-lab/flan-alpaca-gpt4-xl"
TRLConfig(
        train=TrainConfig(
            seq_length=2048,
            epochs=1_000,
            total_steps=1_000_000,
            batch_size=64,
            checkpoint_interval=250,
            eval_interval=1,
            pipeline="PromptPipeline",
            trainer="AcceleratePPOTrainer",
            save_optimizer="False",
            save_best=True,
            tracker="tensorboard",
            logging_dir=data.tensorboard_dir,
            checkpoint_dir=data.checkpoint_dir,
        ),
        model=ModelConfig(
            model_path=llm_model, model_arch_type="seq2seq", num_layers_unfrozen=2
        ),
        tokenizer=TokenizerConfig(tokenizer_path=llm_model, truncation_side="right"),
        optimizer=OptimizerConfig(
            name="adamw",
            kwargs=dict(lr=1.0e-5, betas=(0.9, 0.95), eps=1.0e-8, weight_decay=1.0e-6),
        ),
        scheduler=SchedulerConfig(
            name="cosine_annealing", kwargs=dict(T_max=10_000, eta_min=1.0e-5)
        ),
        method=PPOConfig(
            name="PPOConfig",
            num_rollouts=16,
            chunk_size=2,
            ppo_epochs=8,
            init_kl_coef=0.05,
            target=6,
            horizon=10_000,
            gamma=1,
            lam=0.95,
            cliprange=0.2,
            cliprange_value=0.2,
            vf_coef=1,
            scale_reward="ignored",
            ref_mean=None,
            ref_std=None,
            cliprange_reward=10,
            gen_kwargs=dict(
                max_new_tokens=40,
                top_k=0,
                top_p=1.0,
                do_sample=True,
            ),
        )
)

Ways I've tried to load the model for inference

Pipeline

from transformers import pipeline

pipe = pipeline(
    "text2text-generation",
    model=str(path),
    tokenizer="declare-lab/flan-alpaca-gpt4-xl",
)

text = pipe("tell me a joke")

T5

from transformers import T5ForConditionalGeneration

model = T5ForConditionalGeneration.from_pretrained(str(path))
tokenizer = AutoTokenizer.from_pretrained("declare-lab/flan-alpaca-gpt4-xl")
test_prompt = "tell me a joke"
input_ids = tokenizer(test_prompt, return_tensors="pt").input_ids
output = model.generate(input_ids, max_length=1000, do_sample=True)

AutoModelForCausalLM

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(str(path))
tokenizer = AutoTokenizer.from_pretrained("declare-lab/flan-alpaca-gpt4-xl")
test_prompt = "tell me a joke"
input_ids = tokenizer(test_prompt, return_tensors="pt").input_ids
output = model.generate(input_ids, max_length=1000, do_sample=True)

All of these attempts produce a similar output, besides 3 because that errors when I try to load the model from_pretrained.

Which trlX version are you using?

0.7.0 2e667e6

Additional system and package information

Python 3.10.11, Transformers 4.30.1

Max · Answer 1 · Fri Aug 25 2023 19:56:53 GMT+0800 (China Standard Time)

Thanks @doyled-it for opening this issue! Can confirm this is the case and will try to resolve it shortlly.

Michael Doyle · Answer 2 · Sat Aug 26 2023 00:01:02 GMT+0800 (China Standard Time)

@maxreciprocate thanks for looking into it! I also forgot to mention that in order for it to load, I have to copy in the model's config.json into the checkpoint directory.

Michael Doyle · Answer 3 · Tue Aug 29 2023 05:01:29 GMT+0800 (China Standard Time)

Thank you @maxreciprocate for merging that so quickly. I see that it does now in fact load when I point to the hf_model directory and give me an output that makes sense:

path = Path("path/to/best_checkpoint/hf_model")
pipe = pipeline(
    "text2text-generation",
    model=str(path),
    tokenizer="declare-lab/flan-alpaca-gpt4-xl",
)

text = pipe("tell me a joke")

print(text)

Output:

[{'generated_text': "Why don't scientists trust atoms? Because they make up everything!"}]

Just to make sure: With PR #551, trlx is now saving a trained version of the model with updated weights? And that's what I'm loading when I use the code above?

Max · Answer 4 · Tue Aug 29 2023 05:43:14 GMT+0800 (China Standard Time)

@doyled-it that should be the case, yes

Promise Osaine Ekpo · Answer 5 · Wed Feb 28 2024 18:45:25 GMT+0800 (China Standard Time)

@doyled-it @maxreciprocate , This is very helpful.
In terms of using save_optimizer = false in my configs file, how should my trainer function be set up?

Initially, i did while using save_optimizer = true (Thats the default, i didnt add the variable to my configs file), my training file was as follows:

trainer = trlx.train(
config.model.model_path,
reward_fn=batch_reward_fn,
prompts=repeated_train_prompts,
eval_prompts=eval_prompts,
config=config,
)

Now that i am setting save_optimizer = false , will my training file remain the same or i have to add model.savepretrained("rlhf_trained _model")?. See the below:

trainer = trlx.train(
    config.model.model_path,
    reward_fn=batch_reward_fn,
    prompts=repeated_train_prompts,
    eval_prompts=eval_prompts,
    config=config,
)
model = trainer.model
model.save_pretrained("rlhf_trained_model")

Thank you

Michael Doyle · Answer 6 · Fri Mar 01 2024 09:40:14 GMT+0800 (China Standard Time)

@promiseve

trainer = trlx.train(
    config.model.model_path,
    reward_fn=batch_reward_fn,
    prompts=repeated_train_prompts,
    eval_prompts=eval_prompts,
    config=config,
)
model = trainer.model
model.save_pretrained("rlhf_trained_model")

You don't need to run model.save_pretrained I believe. It will save the model(s) to the checkpoint_dir just like before, so long as you've set that variable in the TrainConfig. If I understand the question correctly.

Promise Osaine Ekpo · Answer 7 · Thu May 23 2024 05:23:43 GMT+0800 (China Standard Time)

Hi @doyled-it , it has been a while since this is a side project. I had to update to the current trlx version and now have my training pipeline. Unfortunately, when i run the scripts and include save_optimizer = False, i can't find where the model is stored. Previously, it was getting stored by default in "ckpts" or any path i specify as the checkpoint-directory in the confg.
Now none of those contain the saved file. Is the model saved to a different path by default after using save_optimizer=True?

Thanks