Hello, does this model support training on a single GPU

Question

Before-dawn-1 opened this issue 3 years ago · comments

Describe the bug
A clear and concise description of what the bug is.
ERROR:torch.distributed.elastic.multiprocessing.api:failed
To Reproduce

Dataset: VOC2012
Setting: ...
Command used or script used:
I tried two methods：

python -m torch.distributed.launch --nproc_per_node=1 run.py --data_root /home/before_dawn/Code/Dataset/VOCtrainval_11-May-2012/VOCdevkit/VOC2012 --batch_size 12 --dataset voc --name PLOP --task 15-5s --overlap --step 1 --lr 0.001 --epochs 30 --method FT --pod local --pod_factor 0.01 --pod_logits --pseudo entropy --threshold 0.001 --classif_adaptive_factor --init_balanced --pod_options "{"switch": {"after": {"extra_channels": "sum", "factor": 0.0005, "type": "local"}}}"
python run.py --data_root /home/before_dawn/Code/Dataset/VOCtrainval_11-May-2012/VOCdevkit/VOC2012 --batch_size 12 --dataset voc --name PLOP --task 15-5s --overlap --step 1 --lr 0.001 --epochs 30 --method FT --pod local --pod_factor 0.01 --pod_logits --pseudo entropy --threshold 0.001 --classif_adaptive_factor --init_balanced --pod_options "{"switch": {"after": {"extra_channels": "sum", "factor": 0.0005, "type": "local"}}}"
Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.
Does this model support training on a single GPU.

Arthur Douillard · Answer 1 · Fri Oct 08 2021 17:34:36 GMT+0800 (China Standard Time)

It does work on a single GPU. Look at the provided script like https://github.com/arthurdouillard/CVPR2021_PLOP/blob/main/scripts/voc/plop_15-1.sh, and set GPU=0 and NB_GPU=1