Possible bug in implementation of PPO Clipped Surrogate Objective
aishwaryap opened this issue · comments
I believe there might be a bug in the implementation of the Clipped Surrogate Objective in PPO here.
According to Equation 7 in the PPO paper, I would expect that line to be
pg_loss = tf.reduce_mean(tf.minimum(pg_losses, pg_losses2))
In case this is not a bug, I would appreciate it if someone could explain to me why it's maximum in the code?
Hi Aishwarya, the paper used an objective that should be maximized, whereas the code uses a loss that should be minimized. You can verify that if you flip it the wrong way, the algorithm doesn't work at all.