CUDA memory

Question

CUDA memory

Huntersxsx opened this issue a year ago · comments

Hello, thanks for your great work!
As you said 'The released lavt_one weights were trained using 8 x 32G V100 cards (max mem on each card was about 13G)', while I only have 8 11G 2080Ti GPUs. Therefore, I try to use swin-tiny instead of swin-base. Unfortunately, 'CUDA out of memory. Tried to allocate 170.00 MiB (GPU 5; 10.76 GiB total capacity; 9.31 GiB already allocated; 47.12 MiB free; 9.68 GiB reserved in total by PyTorch)' still occurs.
I use the command
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 train.py \
--model lavt_one --dataset refcoco --model_id refcoco \
--batch-size 8 --lr 0.00005 --wd 1e-2 \
--swin_type tiny --pretrained_swin_weights ./pretrained_weights/swin_tiny_patch4_window7_224.pth \
--epochs 40 --img_size 480 2>&1 | tee ./models/refcoco/output
I want to know if my command is incorrect or 8 2080Ti cannot even support swin-tiny.
Looking forward to your reply, Thank you~

Zhao Yang · Answer 1 · Thu Mar 16 2023 13:29:47 GMT+0800 (China Standard Time)

Hi, no problem.

11 GB should be enough for a swin-tiny.

Change --batch-size to 4, instead of using 8. This argument is the number of samples per GPU card.

For instance, if you use 8 cards and --batch-size 8, then the total batch size would be 64. Changing it to 4 would give you a total batch size of 4x8=32.

That should solve the problem.

Sun Xin · Answer 2 · Thu Mar 16 2023 17:46:19 GMT+0800 (China Standard Time)

Hi, no problem.

11 GB should be enough for a swin-tiny.

Change --batch-size to 4, instead of using 8. This argument is the number of samples per GPU card.

For instance, if you use 8 cards and --batch-size 8, then the total batch size would be 64. Changing it to 4 would give you a total batch size of 4x8=32.

That should solve the problem.

Thank you, it works!
I have another concern about the test process.
As your concurrent work, CVPR2022-CRIS, they use different checkpoints for different validation sets, as stated in issue.
In other words, they use 3 different checkpoints to test on RefCOCO val、testA, and testB. I wonder if you choose the same checkpoint for 3 different validation sets of the same dataset, or you train 3 different checkpoint as CRIS did?

Zhao Yang · Answer 3 · Fri Mar 24 2023 17:25:09 GMT+0800 (China Standard Time)

Hi,

We use one checkpoint for evaluation on all subsets of a dataset.

It is incorrect to use separate checkpoints for those subsets. We validated via the 'val' set (i.e., deciding on the weights); then we used the validated weights for evaluation on the test sets. The concepts of validating and testing really go back to the very basics of machine learning.

If a dataset simply doesn't have a test set, or the annotations for the test data are not available, then reporting on the validation set alone is what it is. But if there are test sets, it would be very wrong to fit the model to the test sets. One can only fit the model to the validation set, and evaluate it on the test sets.

Having said those, as the name 'val' suggests, the 'val' subset is the validation set. And as the names 'testA' and 'testB' suggest, those subsets are test sets.

It is very wrong to refer to all those subsets as "validation" sets. They are not. Their respective names already defined what they are. Very wrong. All I could recommend is to never do that. Doing that intentionally, to get good scores, borders on unethical practices.