How to resume training on colab upon session timeout?
kartikJ-9 opened this issue · comments
Decent results require 3k-5k frames. My GPU session on colab gets disconnected due to usage while training. I am saving the checkpoints in the drive. Is there any way I can resume the training from a particular epoch? I have a sequence of images obtained from a video. I am new to PyTorch. Somebody suggested saving the weights of the epoch and continuing from that checkpoint.
i also have same issue.Plz help!!
I need some help about this too, because I use the flag --continue_train and --which__epoch but no matter what number I pass, the training begins from epoch 1.