Stupid issue

Question

Stupid issue

opened this issue 6 years ago · comments

Hi

I implemented my own version of sac and the log probability of policy went above 0 sometimes when using the version given in paper.

According to what I read here (Pg6) , I think the squashing correction should be added not subtracted, since the determinant of Jacobian is multiplied when calculating pdf.
But then this incentivises the agent to just set actions to 1 to get low log pi

I am pretty sure I am missing something here. Can you please explain how did you arrive at the squashing correction given in the paper?

Deleted user · Answer 1 · Sat Oct 13 2018 05:39:06 GMT+0800 (China Standard Time)

sorry i got confused with very basic things.