Help restarting after GPU out of memory error at 3 days

Question

Help restarting after GPU out of memory error at 3 days

djproc opened this issue 5 years ago · comments

Hi, I'm too much of a noob to figure this out myself, and would love to know if there is a simple answer.

I was training for 3 days on my own dataset and was loving the results at 256x256, unfortunately as soon as we progressed up, the batch size was too large to be handled by my GPU. I guess i'll have to make it bs=2 or 1 (currently 4).

Is there a way to restart the training from the end point of 256x256? I don't want to start all over again...

THANK YOU!

Djproc

p.s. this is some fantastic work you have done and i'm really appreciative that you've made it so easy to get started!

Animesh Karnewar · Answer 1 · Sun Sep 08 2019 18:29:24 GMT+0800 (China Standard Time)

@djproc,

Yes, in order to restart training from 256 x 256, you need to:
1.) Set the start depth = 7
2.) Provide all the five .pth files to the training script, viz. generator_weights, discriminator_weights, stable_generator_weights, generator_optimizer and discriminator_optimizer.
3.) Ensure that the fade-in alpha is working properly.

Please let me know if you are facing any more problems.

Cheers 🍻!
@akanimax