Issue with ppo_trainer.generate()

Question

Issue with ppo_trainer.generate()

aishu194 opened this issue a year ago · comments

Thank you for the clear-cut amazing video tutorial and repo. I have been working on this repo and faced the following issue on 8 GPU A100 with OS disk space of 100GB and 5TB external. Could you kindly help me with this!!

Traceback (most recent call last):
File "rl_finetuning.py", line 175, in
response_tensor = ppo_trainer.generate(query_tensor, pad_token_id=tokenizer.eos_token_id, max_new_tokens=20)
File "/data-mount/trl/trl/trainer/ppo_trainer.py", line 450, in generate
response = self.accelerator.unwrap_model(self.model).generate(
File "/data-mount/trl/trl/models/modeling_value_head.py", line 198, in generate
return self.pretrained_model.generate(*args, **kwargs)
File "/home/aishu/.local/lib/python3.8/site-packages/peft/peft_model.py", line 977, in generate
outputs = self.base_model.generate(**kwargs)
File "/home/aishu/.local/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/aishu/.local/lib/python3.8/site-packages/transformers/generation/utils.py", line 1642, in generate
return self.sample(
File "/home/aishu/.local/lib/python3.8/site-packages/transformers/generation/utils.py", line 2724, in sample
outputs = self(
File "/home/aishu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/aishu/.local/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 809, in forward
outputs = self.model(
File "/home/aishu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/aishu/.local/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 628, in forward
batch_size, seq_length = input_ids.shape
ValueError: too many values to unpack (expected 2)

Chunhua Liao · Answer 1 · Sat Aug 26 2023 14:13:45 GMT+0800 (China Standard Time)

I encountered the same error. Using python -m pdb , I investigated the tensor's shape at runtime.

It had the right 2-D shape initially:

(Pdb) up
> ..../pytorch1.13.1/lib/python3.9/site-packages/trl/trainer/ppo_trainer.py(454)generate()
-> response = self.accelerator.unwrap_model(self.model).generate(
(Pdb) p query_tensor
tensor([[    1, 12027,  7420,   278,  2224,  3021,   952, 29875,  3002,   310,
           379,  5667, 29914, 29909,  1367, 29903, 29889,  4121,   993, 13676,
         17091,  5065,  3381,   322,   521,   342,  6788,   363,   278,  4940,
          4723, 29889]], device='cuda:0')
(Pdb) p query_tensor.shape
torch.Size([1, 32])

But the code at line 455 added a new dimension by using the following statement
input_ids=query_tensor.unsqueeze(dim=0)

As a result, when the code reaches

pytorch1.13.1/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 623
batch_size, seq_length = input_ids.shape

(Pdb) p input_ids.shape

torch.Size([1, 1, 32])

The assignment tries to assign a 3-D shape into two variables, triggering the error:

ValueError: too many values to unpack (expected 2)

This seems to be a bug in the package.

Somebody reported a similar problem and a solution: https://stackoverflow.com/questions/67193312/huggingface-transformers-returning-valueerror-too-many-values-to-unpack-expec

essentially, the code needs to ignore the first dimension by using something like

    fakevar1, batch_size, seq_length = input_ids.shape

Chunhua Liao · Answer 2 · Sat Aug 26 2023 14:31:42 GMT+0800 (China Standard Time)

I suggest adding "ValueError: too many values to unpack (expected 2)" into the issue title so others can easily find this error.