How to resume training on colab upon session timeout?

Question

How to resume training on colab upon session timeout?

kartikJ-9 opened this issue 4 years ago · comments

Decent results require 3k-5k frames. My GPU session on colab gets disconnected due to usage while training. I am saving the checkpoints in the drive. Is there any way I can resume the training from a particular epoch? I have a sequence of images obtained from a video. I am new to PyTorch. Somebody suggested saving the weights of the epoch and continuing from that checkpoint.

Renish Charaniya · Answer 1 · Fri Sep 25 2020 02:58:15 GMT+0800 (China Standard Time)

i also have same issue.Plz help!!

Beatriz Costa · Answer 2 · Thu Apr 29 2021 01:17:55 GMT+0800 (China Standard Time)

I need some help about this too, because I use the flag --continue_train and --which__epoch but no matter what number I pass, the training begins from epoch 1.