yuniaXian / ppo_llm_DeepSpeed

Customized llm PPO (reinforcement learning) pipeline with deepSpeed. For Amex external usage. Training reward model, actor-critic models with referenced supervised fine-tuned model

Home Page:https://www.deepspeed.ai/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

yuniaXian/ppo_llm_DeepSpeed Stargazers