seungeunrho / minimalRL

Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reinforce implementation looks to use old data without importance sampling

sritee opened this issue · comments

The traditional implementation of REINFORCE, without importance sampling should only use data collected by the current policy to update the parameters. However, in reinforce.py, the data buffer doesn't seem to reset after every policy update. Thoughts?

Hi, thanks for the comment!
I think you can find the code for resetting data buffer in line 37, reinforce.py.
When train ends, it makes the buffer empty, and collect new data with updated policy.