Hello, does this model support training on a single GPU
Before-dawn-1 opened this issue · comments
Before_dawn commented
Describe the bug
A clear and concise description of what the bug is.
ERROR:torch.distributed.elastic.multiprocessing.api:failed
To Reproduce
Dataset: VOC2012
Setting: ...
Command used or script used:
I tried two methods:
- python -m torch.distributed.launch --nproc_per_node=1 run.py --data_root /home/before_dawn/Code/Dataset/VOCtrainval_11-May-2012/VOCdevkit/VOC2012 --batch_size 12 --dataset voc --name PLOP --task 15-5s --overlap --step 1 --lr 0.001 --epochs 30 --method FT --pod local --pod_factor 0.01 --pod_logits --pseudo entropy --threshold 0.001 --classif_adaptive_factor --init_balanced --pod_options "{"switch": {"after": {"extra_channels": "sum", "factor": 0.0005, "type": "local"}}}"
- python run.py --data_root /home/before_dawn/Code/Dataset/VOCtrainval_11-May-2012/VOCdevkit/VOC2012 --batch_size 12 --dataset voc --name PLOP --task 15-5s --overlap --step 1 --lr 0.001 --epochs 30 --method FT --pod local --pod_factor 0.01 --pod_logits --pseudo entropy --threshold 0.001 --classif_adaptive_factor --init_balanced --pod_options "{"switch": {"after": {"extra_channels": "sum", "factor": 0.0005, "type": "local"}}}"
Expected behavior
A clear and concise description of what you expected to happen.
Additional context
Add any other context about the problem here.
Does this model support training on a single GPU.
Arthur Douillard commented
It does work on a single GPU. Look at the provided script like https://github.com/arthurdouillard/CVPR2021_PLOP/blob/main/scripts/voc/plop_15-1.sh, and set GPU=0 and NB_GPU=1