CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

Implement Asynchronous PPO

Dahoas opened this issue a year ago · comments

Alex Havrilla commented a year ago

🚀 The feature, motivation, and pitch

Implementing an asynchronous PPO mitigates model rollout/exploration as the largest bottleneck in the training process.

Alternatives

No response

Additional context

No response