Restarting interrupted training / checkpoints
deccolquitt opened this issue · comments
Is there anyway to restart interrupted training? I can't see a checkpoint-related command relating to train_main.py
Hi, I haven't implemented such a feature, but it should be quite easy: since training is done for each scale independently, and for each finished scale the networks are saved, you could start training from the last finished scale. If you decide to implement this I would be happy to add this feature to the repository.
Gal
Unfortunately I don't know enough about coding to do this, whenever I have tried using the same dataset it just creates a new directory and starts from scratch. Thanks anyway.
would this be the right sort of thing to look at?: [https://stackoverflow.com/questions/42703500/best-way-to-save-a-trained-model-in-pytorch]
Yes, but this is already done during training. If you want to implement continuation of existing model, you have to load its saved networks and continue training of the following scales.