Multi GPU speed ?

Question

Multi GPU speed ?

gabgren opened this issue 2 years ago · comments

Hi!

I was under the assumption that using multiple gpus for training pix2pix would result in a faster training, but this is not what I am experiencing. In fact I get slower speeds, the best I can do is keeping the s/it more or less the same as with 1 gpu.

For testing, I was using batch_size 8 for single gpu and batch_size 64 for 8 gpus. Tests were done on 8x A6000 and 8x 3090. I have also tested setting norm to instance and batch with no effect.

What am I doing wrong, or getting wrong ? Am I right to expect to train faster with more GPUs or is it that by using multiple gpu_ids i get to train higher resolution ?

Thanks !

Taesung Park · Answer 1 · Wed Jun 15 2022 03:49:48 GMT+0800 (China Standard Time)

Could you check if the GPU utilization is at 100%? It could be because the data loader does not feed training images fast enough. Another possibility is that the progress in the total number of images used for training is actually faster with more GPUs, but if you are monitoring the number of iterations, it won't be different.

gabgren · Answer 2 · Wed Jun 29 2022 23:46:24 GMT+0800 (China Standard Time)

Looks like its your first theory: it takes a long time feeding the 8 gpus. the actual processing seems to be faster, but is slowed down between iterations. See this comparison between the GPU utilization of 1xA6000 vs 8xA6000:

How can I speed this up ?

Jun-Yan Zhu · Answer 3 · Tue Jul 05 2022 01:00:39 GMT+0800 (China Standard Time)

it might be a data loading issue. You may want to use SSD or other fast file systems.

malinjie-hub · Answer 4 · Thu Aug 11 2022 15:19:53 GMT+0800 (China Standard Time)

I have 4 GPUs and want to use these 4 GPUs for accelerated training at the same time, how can I modify the code? At present, it can only be trained on one GPU, and the training speed is very slow, thank you!

icelandno1 · Answer 5 · Mon Aug 29 2022 17:32:15 GMT+0800 (China Standard Time)

@gabgren I have 4 GPUs and want to use these 4 GPUs for accelerated training at the same time, how can I modify the code? At present, it can only be trained on one GPU, and the training speed is very slow, --gpu_ids 0,1,2,3 does not work，thank you!

Jun-Yan Zhu · Answer 6 · Wed Aug 31 2022 04:42:18 GMT+0800 (China Standard Time)

What is your batch_size? By mentioning "does not work", are you referring to (1) the model is only trained on one GPU, or (2) the model is trained on multiple GPUs, but the training speed is not as fast as you expect?

icelandno1 · Answer 7 · Wed Aug 31 2022 10:37:51 GMT+0800 (China Standard Time)

@junyanz batch_size is "4", after the use of --gpu_ids 0,1,2,3， the model is only trained on one GPU

Taesung Park · Answer 8 · Wed Sep 07 2022 04:58:46 GMT+0800 (China Standard Time)

This could be because of the limitation of nn.DataParallel we use here, which was a common approach when we published the git repo. But it does suffer from suboptimal GPU utilization because the data loading is inefficient. A better way would be utilizing DistributedDataParallel link. We don't plan to support this for now, but if someone could create a PR I'd appreciate it.