random samplers keeps state
gshennvm opened this issue · comments
Gerald Shen commented
since #73 we've switched to using random samplers to support multiple epochs.
However, the __iter__
method of the random sampler keeps state which causes validation on a subset of our data as in PPO to use a different subset each time.
In PPO we try to reset the optimizer by calling iter on it, but because the random sampler keeps state, it does not actually reset the validation set