How to generate reward-labeled dataset

Question

How to generate reward-labeled dataset

mikkelmedm opened this issue 10 months ago · comments

🚀 The feature, motivation, and pitch

Would like to fine-tune either using a reward model or using a reward-labeled dataset, however am unable to find any references to how such a dataset looks like or how to generate it. Hope you care to elaborate, as I am new to this.

Alternatives

No response

Additional context

No response