Program stucked when training on more than 2 GPUs of a single machine
August0424 opened this issue · comments
I used 2 GPUs on a single machine to train model, and it correctly worked. However, when using more 2 GPUs for training, the program stucked in network's forward propagation step, where the utilization rate of GPUs stays 100%