Question on Eq. 4
denkorzh opened this issue · comments
Hi, thanks for sharing your results!
I'm afraid I did not really get how you derived Eq. 4: if I'm not mistaken,
∇ p(c ∣ xₜ) = − λ ∇ ℰ (c, xₜ) + λ ∇ 𝔼 [ ℰ (c, xₜ) ],
where the gradient ∇ is w.r.t. xₜ, and the expectation 𝔼 is over p(c ∣ xₜ).
Why have you decided to ignore the second term? Thank you in advance!
Hi, @denkorzh Thanks for your attention!
I get the Eq.4 just considering the Z in Eq.3 is a constant, then we discard this term after computing the
Could you tell me how you derived this expectation term? Perhaps we can find some better insights from it to help us improve our work! Look forward to your reply:)
Hi, @denkorzh Thanks for your attention! I get the Eq.4 just considering the Z in Eq.3 is a constant, then we discard this term after computing the ∇xtlog(⋅)∇xtlog(⋅)\nabla_{\mathbf{x}_t}\log(\cdot). Could you tell me how you derived this expectation term? Perhaps we can find some better insights from it to help us improve our work! Look forward to your reply:)
@vvictoryuki I have the similar confusion.
Writing the normalizing constant
$$
Z=\mathbb{E}_{c \in p(c|x_t)} \ [\exp (-\lambda \mathcal{\epsilon(c, x_t)})]
$$
Plugging into
$$
\nabla \log p(c|x_t) = -\lambda \nabla \mathcal{\epsilon(c, x_t)} - \frac{\nabla \mathbb{E} [\exp (-\lambda \mathcal{\epsilon(c, x_t)})]}{Z}
$$
But the term Z, more specifically the
@vvictoryuki
Exactly, as @xjtupanda mentioned above, Z is a constant w.r.t. c but is a function of x_t.
Thank you for pointing out the shortcomings in our work. We will address the relevant inaccuracies in the latest arXiv version.