Issue with Multi-GPU Training/Predicting using --gpu_id
Biste-Wang opened this issue · comments
Biste-Wang commented
Issue with Multi-GPU Training/Predicting using --gpu_id
Problem:
I'm currently facing an issue when attempting to train or predict on multiple GPUs using the --gpu_id
flag. Despite specifying multiple GPUs (--gpu_id 0,1
), only one GPU is being utilized.
Environment:
- PyTorch version: 1.12.0
- CUDA version: 12..2
- GPU model(s): 0,1
- Operating System: Windows
Reproducible Example:
python predict.py --input datasets/data/mydata --dataset cityscapes --model deeplabv3plus_resnet101 --val_batch_size 64 --ckpt checkpoints/best_deeplabv3plus_resnet101_cityscapes_os16.pth --save_val_results_to test_results/myresult --gpu_id 0,1
Expected Behavior:
I expect the training or predicting to utilize both GPUs specified in --gpu_id
.
Actual Behavior:
Only one GPU is being used, and the workload is not distributed across the specified GPUs.
Additional Information:
- I have verified that both GPUs are available and functional.
- My PyTorch version is up-to-date.
- CUDA and cuDNN versions are compatible with PyTorch.
- The model and optimizer are moved to the correct device in the script.
Any help or suggestions to troubleshoot this issue would be greatly appreciated!