pytorch / examples

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

Home Page:https://pytorch.org/examples

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

the multigpu training is 4 times slower

shamsbasir opened this issue · comments

Hi @suraj813,

I used your tutorial codes to experiment with distributed training. My training time for 1 epoch with multigpu_torchrun.py
is 4 times slower than the single_gpu.py . I experimented with multiple batchsizes and large training size as well as large network to make sure the communication overhead is not higher than the computation.
could you please point me to what could possibly be the issue ?

Thanks