About training code
Pexure opened this issue · comments
Hi, there. Thanks for your amazing work, but I have some questions about the training code.
-
Do we need to modify
main_train_psnr.py
(KAIR) to set training iterations to 500K? It's 1M epochs in the original file. -
I ran training
python -m torch.distributed.launch --nproc_per_node=8 --master_port=1234 main_train_psnr.py --opt options/swinir/train_swinir_sr_classical.json --dist True
on 8 RTX 3090 GPUs and the dataset is DIV2K train split (default X2). The estimated training time for 500K iters is ~3.5days (1min/100 iters), much longer than your 1.8 days on 8 2080 Ti GPUs. Do you have any idea about that?
1, Yes, 500K is enough for SR.
2, No idea. Maybe you can add more n_workers. Or you can try the codes here.
Thanks for your reply. I have found the reason. I'm new to SR and missed data preparation described in BasicSR. I think it would be better to make it clear in KAIR :)
Thanks for you advice.