LSTM + PPO value fitting
hnshahao opened this issue · comments
Hello, Thanks for your great work!
I have one dumb question.
in LSTM PPO realization, I noticed that when calculating v_prime and v_s, the same first_hidden value is used, my question is: should v_prime use a different first hidden value? or just a approximation.
Thank You!
v_prime = self.v(s_prime, first_hidden).squeeze(1)
td_target = r + gamma * v_prime * done_mask
v_s = self.v(s, first_hidden).squeeze(1)
It's not dumb at all.
It seems I made a critical mistake.
I will modify the code to make use of the second hidden state for calculating the v_prime
.
Thanks for the great idea.