Reward function on both aesthetic and prompt alignment
scarbain opened this issue · comments
Hi,
Wonderful project, congratulations !
Have you tried using a reward function for both objectives ? Because it feels like the aesthetic reward do make the generations look better but also oversimplifies it (posing only one subject in the center with a blur effect around).
Also, do you have any insights on how to use a new reward function ? Should it be normalized between a certain min-max ? How long does it take to train each model ? Does it change between the different reward functions ?
Thanks :)
For both prompt alignment and aesthetic, CarperAI used the PickScore model : https://github.com/CarperAI/DRLX