Batch size modification

Question

Batch size modification

chandratejatiriveedhi opened this issue a year ago · comments

chandratejatiriveedhi commented a year ago

I am having an issue when I am trying to train the SETR model on cityscapes dataset using this config file SETR_PUP_768x768_40k_cityscapes_bs_8. I am trying to train this on one GPU and I get the following CUDA out of memory error. Tried to allocate 326.00 MiB (GPU 0; 11.90 GiB total capacity; 10.88 GiB already allocated; 254.94 MiB free; 11.06 GiB reserved in total by PyTorch)
error saying that

I am trying to modify the batchsize to 1 instead of 8, where can I do this in the config file? Is it data = dict(samples_per_gpu=1)? What is the ideal number of GPU's to train this model on CityScapes dataset?

Also, do you have any updated version of the code to run this code on Cuda 11 and beyond.

Sixiao Zheng · Answer 1 · Sat May 13 2023 15:23:15 GMT+0800 (China Standard Time)

Thank you for your interest in our work. You can change the batch size on https://github.com/fudan-zvg/SETR/blob/main/configs/SETR/SETR_MLA_768x768_80k_cityscapes_bs_8.py#L61 . If you still can't solve the problem, you can try to run other datasets, or SETR-Naive, or change the image size on https://github.com/fudan-zvg/SETR/blob/main/configs/SETR/SETR_MLA_768x768_80k_cityscapes_bs_8.py#L7. For a fair comparison with other papers, we train on 8 GPUs with one sample per GPU. If you want to run SETR on CUDA11, it is recommended to try the implementation of SETR in mmsegmentation https://github.com/open-mmlab/mmsegmentation/tree/main/configs/setr .

chandratejatiriveedhi · Answer 2 · Sun May 21 2023 15:18:25 GMT+0800 (China Standard Time)

Hi Zheng, When I try to run the training script, I get the error AssertionError: Default process group is not initialized. Can you tell me what are the common causes of this error? And how can I resolve this? Is this due to training it on one GPU and doing non distributed training? Please let me know. With Regards, Teja

…

On Sat, 13 May 2023 at 03:23, Sixiao Zheng ***@***.***> wrote: Thank you for your interest in our work. You can change the batch size on https://github.com/fudan-zvg/SETR/blob/main/configs/SETR/SETR_MLA_768x768_80k_cityscapes_bs_8.py#L61 . If you still can't solve the problem, you can try to run other datasets, or SETR-Naive, or change the image size on https://github.com/fudan-zvg/SETR/blob/main/configs/SETR/SETR_MLA_768x768_80k_cityscapes_bs_8.py#L7. For a fair comparison with other papers, we train on 8 GPUs with one sample per GPU. If you want to run SETR on CUDA11, it is recommended to try the implementation of SETR in mmsegmentation https://github.com/open-mmlab/mmsegmentation/tree/main/configs/setr . — Reply to this email directly, view it on GitHub <#58 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AUZJIQOHOP57LAABVSPCHXLXF4ZG5ANCNFSM6AAAAAAX64NCHY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

chandratejatiriveedhi · Answer 3 · Mon May 29 2023 01:46:58 GMT+0800 (China Standard Time)

Hi Zheng, Do you have any further updates on this and common issues for assertion error occurring? With Regards,Teja On May 21, 2023, at 3:17 AM, Chandra Teja Tiriveedhi ***@***.***> wrote:Hi Zheng, When I try to run the training script, I get the error AssertionError: Default process group is not initialized. Can you tell me what are the common causes of this error? And how can I resolve this? Is this due to training it on one GPU and doing non distributed training?Please let me know. With Regards, Teja On Sat, 13 May 2023 at 03:23, Sixiao Zheng ***@***.***> wrote: Thank you for your interest in our work. You can change the batch size on https://github.com/fudan-zvg/SETR/blob/main/configs/SETR/SETR_MLA_768x768_80k_cityscapes_bs_8.py#L61 . If you still can't solve the problem, you can try to run other datasets, or SETR-Naive, or change the image size on https://github.com/fudan-zvg/SETR/blob/main/configs/SETR/SETR_MLA_768x768_80k_cityscapes_bs_8.py#L7. For a fair comparison with other papers, we train on 8 GPUs with one sample per GPU. If you want to run SETR on CUDA11, it is recommended to try the implementation of SETR in mmsegmentation https://github.com/open-mmlab/mmsegmentation/tree/main/configs/setr . —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>