How to generate reward-labeled dataset
mikkelmedm opened this issue Β· comments
π The feature, motivation, and pitch
Would like to fine-tune either using a reward model or using a reward-labeled dataset, however am unable to find any references to how such a dataset looks like or how to generate it. Hope you care to elaborate, as I am new to this.
Alternatives
No response
Additional context
No response