karpathy / llm.c

LLM training in simple, raw C/CUDA

Repository from Github https://github.comkarpathy/llm.cRepository from Github https://github.comkarpathy/llm.c

Dataloader shuffle_rng logic bug under multi-gpu settings?

codedeft opened this issue · comments

It seems that each process under a multi-gpu train run uses a different seed for data shuffling (42 + process_rank as seen in dataloader.h line 172). This results in different random permutations of shard_indices as well as intra_shard_indices for processes and potentially leads to overlapping data load.

Is this expected? I would have expected random seeds for data shuffling to be the same for all processes.