the multigpu training is 4 times slower
shamsbasir opened this issue · comments
Shams Basir commented
Hi @suraj813,
I used your tutorial codes to experiment with distributed training. My training time for 1 epoch with multigpu_torchrun.py
is 4 times slower than the single_gpu.py . I experimented with multiple batchsizes and large training size as well as large network to make sure the communication overhead is not higher than the computation.
could you please point me to what could possibly be the issue ?
Thanks
Suraj Subramanian commented
Duplicate of pytorch/tutorials#2114