jannerm / ddpo

Code for the paper "Training Diffusion Models with Reinforcement Learning"

Home Page:https://rl-diffusion.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reward function on both aesthetic and prompt alignment

scarbain opened this issue · comments

Hi,

Wonderful project, congratulations !

Have you tried using a reward function for both objectives ? Because it feels like the aesthetic reward do make the generations look better but also oversimplifies it (posing only one subject in the center with a blur effect around).

Also, do you have any insights on how to use a new reward function ? Should it be normalized between a certain min-max ? How long does it take to train each model ? Does it change between the different reward functions ?

Thanks :)

For both prompt alignment and aesthetic, CarperAI used the PickScore model : https://github.com/CarperAI/DRLX