Reinforce implementation looks to use old data without importance sampling
sritee opened this issue · comments
The traditional implementation of REINFORCE, without importance sampling should only use data collected by the current policy to update the parameters. However, in reinforce.py, the data buffer doesn't seem to reset after every policy update. Thoughts?
Hi, thanks for the comment!
I think you can find the code for resetting data buffer in line 37, reinforce.py.
When train ends, it makes the buffer empty, and collect new data with updated policy.