openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possible bug in implementation of PPO Clipped Surrogate Objective

aishwaryap opened this issue · comments

I believe there might be a bug in the implementation of the Clipped Surrogate Objective in PPO here.
According to Equation 7 in the PPO paper, I would expect that line to be

pg_loss = tf.reduce_mean(tf.minimum(pg_losses, pg_losses2))

In case this is not a bug, I would appreciate it if someone could explain to me why it's maximum in the code?

Hi Aishwarya, the paper used an objective that should be maximized, whereas the code uses a loss that should be minimized. You can verify that if you flip it the wrong way, the algorithm doesn't work at all.