Dataloader shuffle_rng logic bug under multi-gpu settings?
codedeft opened this issue · comments
It seems that each process under a multi-gpu train run uses a different seed for data shuffling (42 + process_rank as seen in dataloader.h line 172). This results in different random permutations of shard_indices as well as intra_shard_indices for processes and potentially leads to overlapping data load.
Is this expected? I would have expected random seeds for data shuffling to be the same for all processes.