seungeunrho / minimalRL

Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LSTM + PPO value fitting

hnshahao opened this issue · comments

Hello, Thanks for your great work!
I have one dumb question.
in LSTM PPO realization, I noticed that when calculating v_prime and v_s, the same first_hidden value is used, my question is: should v_prime use a different first hidden value? or just a approximation.
Thank You!

v_prime = self.v(s_prime, first_hidden).squeeze(1)
td_target = r + gamma * v_prime * done_mask
v_s = self.v(s, first_hidden).squeeze(1)

It's not dumb at all.
It seems I made a critical mistake.
I will modify the code to make use of the second hidden state for calculating the v_prime.
Thanks for the great idea.