kliu128 / nanoGPT-RL

RLHF with PPO from scratch, on GPT-2, using nanoGPT.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

kliu128/nanoGPT-RL Watchers