Question about critic loss

Question

Question about critic loss

qlan3 opened this issue 8 months ago · comments

I notice that the implemented critic loss (https://github.com/luchris429/purejaxrl/blob/main/purejaxrl/ppo.py#L179) in PPO is quite different from traditional TD error, more like PPO's actor loss style. Could you please point me to any reference? If there is no such reference, are there any reasons behind for doing so?

Chris Lu · Answer 1 · Mon Nov 06 2023 23:50:16 GMT+0800 (China Standard Time)

Hello! Good question. The code is inspired from CleanRL's implementation, which itself comes from OpenAI's original implementation.

Costa Huang (author of CleanRL) did an amazing write-up about implementation details here -- In Point 9 of the first section he brings up value function loss clipping! Notably, works investigating it find that it does not help performance, and sometimes can even harm performance. However, I include it for the same reasons that Costa does.

Qingfeng Lan · Answer 2 · Tue Nov 07 2023 00:28:40 GMT+0800 (China Standard Time)

Thank you for your quick and helpful reply!