Which size GPU for full resolution training?

Question

Which size GPU for full resolution training?

robbiebarrat opened this issue 5 years ago · comments

Hi - sorry if this is a broad question but I've looked around the repo and can't find the answer anywhere..

Assuming I only have 1 Titan XP gpu (12gb) - can i train the full resolution of this model? I've been trying to modify the official tensorflow implementation of progressive growing of GANs to only use 12gb; but i have to seriously cut back on the number of filters / batch size to make it work.

Thanks so much.

Animesh Karnewar · Answer 1 · Mon Jul 22 2019 18:00:52 GMT+0800 (China Standard Time)

Hi @robbiebarrat,

Yes, I believe you can train the full model given your GPU. I have been able to train the whole model (same number of channels as the original paper and 1024 x 1024 resolution) on my GTX 1070 (8GB) gpu.

The way you could optimize this is by starting the code at each depth and setting the most optimal batch size for each; prior to starting the actual training. By my experience, you should be able to fit a Batch size of may-be 2 for the highest resolution. But, you'll have to try. Once you have this, then you could proceed with setting the schedule for the progressive growing scheme.

Hope this helps.

Please feel free to let me know if you have any more questions.

Cheers 🍻!
@akanimax

Robbie Barrat · Answer 2 · Mon Jul 22 2019 18:19:58 GMT+0800 (China Standard Time)

@akanimax this helps a lot !! and seriously thank you for such a fast reply.

I have been struggling with the official tensorflow implementation because even though i have modified it to generate 512x1024 images instead of 1024x1024; it still wont fit into my gpu. I think that nvidia made it on purpose to only fit into the super expensive 16gb gpus ;)

One last question - how long can i expect the training to take? how long did it take you to train the celebA model in the examples?

Cheers!

Animesh Karnewar · Answer 3 · Mon Jul 22 2019 19:03:44 GMT+0800 (China Standard Time)

Well, I don't think nvidia would do something like that 😄. I believe it's because they have some lower level optimization involved there. TBH, it was a little difficult for me to use the official pro_gan code too. The StyleGAN code is amazing btw (very easy to reason with and use per se).

In terms of training time. Well, I can't tell... in fact no one can. Especially with new datasets. You would probably spend a long time finding the perfect schedule for the progressive growing and even with the training, I feel you should expect weeks of training.

Btw, if relevant for you, -> you could give the MSG-GAN a try: https://github.com/akanimax/BMSG-GAN. You can read the paper https://arxiv.org/abs/1903.06048 for more information. But with this, you'd probably have to reduce the batch size to something like 2 or 4 maybe for your GPU. But, MSG-GAN is a lot more hasslefree than ProGAN. 😃!

Cheers 🍻!
@akanimax

ss32 · Answer 4 · Mon May 18 2020 04:27:22 GMT+0800 (China Standard Time)

I have been able to train the whole model (same number of channels as the original paper and 1024 x 1024 resolution) on my GTX 1070 (8GB) gpu.

In terms of training time. Well, I can't tell... in fact no one can. Especially with new datasets. You would probably spend a long time finding the perfect schedule for the progressive growing and even with the training, I feel you should expect weeks of training.

How long did it take you to train with the original dataset? I've been working with custom datasets on the order of 125k images and I've found that even a week of training isn't enough; I can't get good results even at lower resolutions.

benx13 · Answer 5 · Thu Apr 13 2023 04:31:48 GMT+0800 (China Standard Time)

Hi @robbiebarrat,

Yes, I believe you can train the full model given your GPU. I have been able to train the whole model (same number of channels as the original paper and 1024 x 1024 resolution) on my GTX 1070 (8GB) gpu.

The way you could optimize this is by starting the code at each depth and setting the most optimal batch size for each; prior to starting the actual training. By my experience, you should be able to fit a Batch size of may-be 2 for the highest resolution. But, you'll have to try. Once you have this, then you could proceed with setting the schedule for the progressive growing scheme.

Hope this helps.

Please feel free to let me know if you have any more questions.

Cheers 🍻! @akanimax

Hi, thanks for the package, can you provide more details on how you were able to train a 1070 on full resolution. I have a 1070 not plugged to display can't seem to surpass 512x512 batch=1 num_channels=1
running ubuntu 22.04 torch 11.10.0