Reinforce implementation looks to use old data without importance sampling

Question

Reinforce implementation looks to use old data without importance sampling

sritee opened this issue 5 years ago · comments

Sridhar Thiagarajan commented 5 years ago

The traditional implementation of REINFORCE, without importance sampling should only use data collected by the current policy to update the parameters. However, in reinforce.py, the data buffer doesn't seem to reset after every policy update. Thoughts?

Seungeun Rho · Answer 1 · Mon May 27 2019 08:13:31 GMT+0800 (China Standard Time)

Hi, thanks for the comment!
I think you can find the code for resetting data buffer in line 37, reinforce.py.
When train ends, it makes the buffer empty, and collect new data with updated policy.