NVIDIA / NeMo-Aligner

Scalable toolkit for efficient model alignment

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

random samplers keeps state

gshennvm opened this issue · comments

since #73 we've switched to using random samplers to support multiple epochs.

However, the __iter__ method of the random sampler keeps state which causes validation on a subset of our data as in PPO to use a different subset each time.

In PPO we try to reset the optimizer by calling iter on it, but because the random sampler keeps state, it does not actually reset the validation set