What is the main library to scale up RL training for LLMs?
aldopareja opened this issue · comments
Assuming you have a reward model (say open assistant reward model) and a target model (say LLaMA), and you want to train it at scale on a multinode setup. What is the best code base for this? DeepSeed-chat?