lucidrains / self-rewarding-lm-pytorch

Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

I encountered the following error when trying to run usage

Yanfors opened this issue · comments

sft fine-tuning: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:06<00:00, 1.43it/s]
generating dpo dataset with self-rewarding: 0it [00:00, ?it/s]Traceback (most recent call last):
File "/root/yx/self_rewarding/test1/usage.py", line 45, in
trainer(overwrite_checkpoints = True)
File "/root/anaconda3/envs/test1/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/test1/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/test1/lib/python3.12/site-packages/self_rewarding_lm_pytorch/self_rewarding_lm_pytorch.py", line 950, in forward
trainer()
File "/root/anaconda3/envs/test1/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/test1/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/test1/lib/python3.12/site-packages/self_rewarding_lm_pytorch/dpo.py", line 442, in forward
train_self_reward_dataset = self.dataset_generator()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/test1/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/test1/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/test1/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/test1/lib/python3.12/site-packages/self_rewarding_lm_pytorch/self_rewarding_lm_pytorch.py", line 577, in forward
rewards: List[Optional[float]] = [self.generate_reward(prompt, response) for response in candidate_responses]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/test1/lib/python3.12/site-packages/self_rewarding_lm_pytorch/self_rewarding_lm_pytorch.py", line 509, in generate_reward
self_reward_model = self_reward_model.to(device)
^^^^^^^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'self_reward_model' where it is not associated with a value
generating dpo dataset with self-rewarding: 0it [01:38, ?it/s]

I can successfully run usage of SPIN