random samplers keeps state

Question

random samplers keeps state

gshennvm opened this issue 7 months ago · comments

since #73 we've switched to using random samplers to support multiple epochs.

However, the __iter__ method of the random sampler keeps state which causes validation on a subset of our data as in PPO to use a different subset each time.

In PPO we try to reset the optimizer by calling iter on it, but because the random sampler keeps state, it does not actually reset the validation set