LSTM + PPO value fitting

Question

LSTM + PPO value fitting

hnshahao opened this issue 5 years ago · comments

Hello, Thanks for your great work!
I have one dumb question.
in LSTM PPO realization, I noticed that when calculating v_prime and v_s, the same first_hidden value is used, my question is: should v_prime use a different first hidden value? or just a approximation.
Thank You!

v_prime = self.v(s_prime, first_hidden).squeeze(1)
td_target = r + gamma * v_prime * done_mask
v_s = self.v(s, first_hidden).squeeze(1)

Seungeun Rho · Answer 1 · Tue Aug 27 2019 09:15:16 GMT+0800 (China Standard Time)

It's not dumb at all.
It seems I made a critical mistake.
I will modify the code to make use of the second hidden state for calculating the v_prime.
Thanks for the great idea.