luchris429 / purejaxrl

Really Fast End-to-End Jax RL Implementations

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about critic loss

qlan3 opened this issue · comments

I notice that the implemented critic loss (https://github.com/luchris429/purejaxrl/blob/main/purejaxrl/ppo.py#L179) in PPO is quite different from traditional TD error, more like PPO's actor loss style. Could you please point me to any reference? If there is no such reference, are there any reasons behind for doing so?

Hello! Good question. The code is inspired from CleanRL's implementation, which itself comes from OpenAI's original implementation.

Costa Huang (author of CleanRL) did an amazing write-up about implementation details here -- In Point 9 of the first section he brings up value function loss clipping! Notably, works investigating it find that it does not help performance, and sometimes can even harm performance. However, I include it for the same reasons that Costa does.

Thank you for your quick and helpful reply!