Issue with Multi-GPU Training/Predicting using --gpu_id

Question

Issue with Multi-GPU Training/Predicting using --gpu_id

Biste-Wang opened this issue 7 months ago · comments

Issue with Multi-GPU Training/Predicting using --gpu_id

Problem:

I'm currently facing an issue when attempting to train or predict on multiple GPUs using the --gpu_id flag. Despite specifying multiple GPUs (--gpu_id 0,1), only one GPU is being utilized.

Environment:

PyTorch version: 1.12.0
CUDA version: 12..2
GPU model(s): 0,1
Operating System: Windows

Reproducible Example:

python predict.py --input datasets/data/mydata --dataset cityscapes --model deeplabv3plus_resnet101 --val_batch_size 64 --ckpt checkpoints/best_deeplabv3plus_resnet101_cityscapes_os16.pth --save_val_results_to test_results/myresult --gpu_id 0,1

Expected Behavior:
I expect the training or predicting to utilize both GPUs specified in --gpu_id.

Actual Behavior:
Only one GPU is being used, and the workload is not distributed across the specified GPUs.

Additional Information:

I have verified that both GPUs are available and functional.
My PyTorch version is up-to-date.
CUDA and cuDNN versions are compatible with PyTorch.
The model and optimizer are moved to the correct device in the script.

Any help or suggestions to troubleshoot this issue would be greatly appreciated!