CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implement Asynchronous PPO

Dahoas opened this issue Β· comments

πŸš€ The feature, motivation, and pitch

Implementing an asynchronous PPO mitigates model rollout/exploration as the largest bottleneck in the training process.

Alternatives

No response

Additional context

No response