clam004 / minichatgpt

annotated tutorial of the huggingface TRL repo for reinforcement learning from human feedback connecting equations from PPO and GAE to the lines of code in the pytorch implementation