qinenergy / corda

[ICCV 2021] Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to continue train?

ygjwd12345 opened this issue · comments

when I use script llike

CUDA_VISIBLE_DEVICES=0 python3 -u trainUDA_gta.py --config ./configs/configUDA_gta2city.json --name UDA-gta --resume /saved/DeepLabv2-depth-gtamono-cityscapestereo/05-03_02-13-UDA-gta/checkpoint-iter95000.pth | tee ./gta-corda.log

It would run again but the new checkpoint would be saved.

Hi.
The training skeleton is directly from DACS, we didn't test the resume function. We trained the model uninterrupted for 250000 iterations.
For your specific use case, maybe this can help:

change "--resume /saved/DeepLabv2-depth-gtamono-cityscapestereo/05-03_02-13-UDA-gta/checkpoint-iter95000.pth" to "--resume ../saved/DeepLabv2-depth-gtamono-cityscapestereo/05-03_02-13-UDA-gta/checkpoint-iter95000.pth" as the default save folder is one level up. The new checkpoints should show up in ../saved/DeepLabv2-depth-gtamono-cityscapestereo/05-03_02-13-UDA-gta-resume/

We didn't test this and maybe it is easier to train from scratch for 250000 to reproduce the results.
Please let me know if you have further questions.

I find the error causing by
if args.resume: checkpoint_dir = os.path.join(*args.resume.split('/')[:-1]) + '_resume-' + start_writeable else: checkpoint_dir = os.path.join(config['utils']['checkpoint_dir'], start_writeable + '-' + args.name)
I remove
`` if args.resume:
checkpoint_dir = os.path.join(*args.resume.split('/')[:-1]) + '_resume-' + start_writeable
else:`
The problem is solved.