Possible bug in implementation of PPO Clipped Surrogate Objective
aishwaryap opened this issue · comments
Aishwarya Padmakumar commented
I believe there might be a bug in the implementation of the Clipped Surrogate Objective in PPO here.
According to Equation 7 in the PPO paper, I would expect that line to be
pg_loss = tf.reduce_mean(tf.minimum(pg_losses, pg_losses2))
In case this is not a bug, I would appreciate it if someone could explain to me why it's maximum in the code?
John Schulman commented
Hi Aishwarya, the paper used an objective that should be maximized, whereas the code uses a loss that should be minimized. You can verify that if you flip it the wrong way, the algorithm doesn't work at all.