Program stucked when training on more than 2 GPUs of a single machine

Question

Program stucked when training on more than 2 GPUs of a single machine

August0424 opened this issue 2 years ago · comments

I used 2 GPUs on a single machine to train model, and it correctly worked. However, when using more 2 GPUs for training, the program stucked in network's forward propagation step, where the utilization rate of GPUs stays 100%