clam004 / minichatgpt

annotated tutorial of the huggingface TRL repo for reinforcement learning from human feedback connecting equations from PPO and GAE to the lines of code in the pytorch implementation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

clam004/minichatgpt Stargazers