CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RLHF text summarization diverges

AlisonWen opened this issue Β· comments

πŸ› Describe the bug

I am running the experiment of trlx_gptj_text_summarization.py, I have not modified the code but the experiment has not converged when more than 3500 steps, and the document said it was meant to converge. I realized the sample project was running the file trlx_gptneo_text_summarization.py, but I cannot find the file anywhere.
image

Which trlX version are you using?

download with source code on 2024/01/13

Additional system and package information

linux jammy, torch==2.0.0+cu118