Training on Colab - CUDA out of memory

Question

Training on Colab - CUDA out of memory

Mancio98 opened this issue 10 months ago · comments

Hi, I would like to ask if someone have tried to train the model on Colab. Yesterday I tried to launch training with GPU, but it runs out of memory. It instantly fills almost all 15Gbs. I tried with smaller batches (6 and 8) but same problem.
Also I replaced the model inside the training of VQGAN , with the same used as inference to the transformer (vq_f16).

Additionally, If @dome272 could upload pretrained weights for both model I would be grateful (I need for my exam project at uni help ahaha)

Many Thanks

Chen Bao · Answer 1 · Wed Dec 06 2023 10:30:03 GMT+0800 (China Standard Time)

Hi. I am also facing the same problem. Have you solved it now? I am going to try the pre-trained models provided in the https://github.com/CompVis/taming-transformers/tree/master.

Andrea Mancini · Answer 2 · Wed Dec 06 2023 19:00:27 GMT+0800 (China Standard Time)

Hi, unfortunately not. A last thing I would like to try is gradient accumulation but I don't think it will solve the problem.
By the way I was looking to do the same as you. If I succeed I can post my solution here.
You can find also here a pretrained VQGAN and also MaskGIT: https://huggingface.co/llvictorll/Maskgit-pytorch/tree/main
Their GitHub: https://github.com/valeoai/Maskgit-pytorch.
They modified some parameters and other few things of transformer, so I suggest to use only their VQGAN if you want to follow the original implementation of MaskGIT.