Reward function on both aesthetic and prompt alignment

Question

Reward function on both aesthetic and prompt alignment

scarbain opened this issue 10 months ago · comments

Sébastien Carbain commented 10 months ago

Hi,

Wonderful project, congratulations !

Have you tried using a reward function for both objectives ? Because it feels like the aesthetic reward do make the generations look better but also oversimplifies it (posing only one subject in the center with a blur effect around).

Also, do you have any insights on how to use a new reward function ? Should it be normalized between a certain min-max ? How long does it take to train each model ? Does it change between the different reward functions ?

Thanks :)

Sébastien Carbain · Answer 1 · Mon Oct 02 2023 20:45:55 GMT+0800 (China Standard Time)

For both prompt alignment and aesthetic, CarperAI used the PickScore model : https://github.com/CarperAI/DRLX