haarnoja / sac

Soft Actor-Critic

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

a mathematical problem ..

bofen97 opened this issue · comments

I derived Equation 12, but the result is not the same as Equation 13 in your paper. In my derivation, I didn't get the first item in Equation 13, I don't know where it is wrong.
can you help me..?

Thanks for the question. You mean Equation 13 in this paper? It is the total derivative of J(\phi) with respect to the policy parameters \phi. Note that both \pi_\phi and a_t = f_\phi depend on these parameters, so we'll need to differentiate with respect to both, and use the chain rule for the latter. It's is exactly analogous to this example.