Possible bug in implementation of PPO Clipped Surrogate Objective

Question

Possible bug in implementation of PPO Clipped Surrogate Objective

aishwaryap opened this issue 3 years ago · comments

Aishwarya Padmakumar commented 3 years ago

I believe there might be a bug in the implementation of the Clipped Surrogate Objective in PPO here.
According to Equation 7 in the PPO paper, I would expect that line to be

pg_loss = tf.reduce_mean(tf.minimum(pg_losses, pg_losses2))

In case this is not a bug, I would appreciate it if someone could explain to me why it's maximum in the code?

John Schulman · Answer 1 · Mon Jan 18 2021 10:32:24 GMT+0800 (China Standard Time)

Hi Aishwarya, the paper used an objective that should be maximized, whereas the code uses a loss that should be minimized. You can verify that if you flip it the wrong way, the algorithm doesn't work at all.