buttercutter / NLHF

A simple code for [Nash Learning from Human Feedback](https://arxiv.org/abs/2312.00886)

Repository from Github https://github.combuttercutter/NLHFRepository from Github https://github.combuttercutter/NLHF

buttercutter/NLHF Watchers