CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to generate reward-labeled dataset

mikkelmedm opened this issue Β· comments

πŸš€ The feature, motivation, and pitch

Would like to fine-tune either using a reward model or using a reward-labeled dataset, however am unable to find any references to how such a dataset looks like or how to generate it. Hope you care to elaborate, as I am new to this.

Alternatives

No response

Additional context

No response