PPON Error when moving to Phase 2

Question

PPON Error when moving to Phase 2

N0manDemo opened this issue 4 years ago · comments

I was training a model with PPON (192) + MultiScale + Diffaug, and I receive the following error when moving to Phase 2:
I have AMP disabled because my GPU doesn't support it.
error.log

21-01-27 11:26:52.449 - INFO: Random seed: 0
21-01-27 11:26:52.647 - INFO: Dataset [LRHRDataset - DIV2K] is created.
21-01-27 11:26:52.647 - INFO: Number of train images: 37,933, iters: 2,371
21-01-27 11:26:52.647 - INFO: Total epochs needed: 43 for iters 100,000
21-01-27 11:26:52.648 - INFO: Dataset [LRHRDataset - val_set14_part] is created.
21-01-27 11:26:52.648 - INFO: Number of val images in [val_set14_part]: 1
21-01-27 11:26:52.650 - INFO: AMP library available
21-01-27 11:26:52.827 - INFO: Initialization method [kaiming]
21-01-27 11:26:54.127 - INFO: Initialization method [kaiming]
21-01-27 11:26:54.185 - INFO: Loading pretrained model for G [../experiments/pretrained_models/PPON_G.pth] ...
21-01-27 11:26:55.276 - INFO: Network G structure: DataParallel - PPON, with parameters: 17,267,657
21-01-27 11:26:55.277 - INFO: Network D structure: DataParallel - MultiscaleDiscriminator, with parameters: 8,296,899
21-01-27 11:26:55.277 - INFO: Model [PPONModel] is created.
21-01-27 11:26:55.277 - INFO: Start training from epoch: 0, iter: 0
21-01-27 11:26:55.991 - INFO: Switching to phase: p2, step: 1
Traceback (most recent call last):
File "/mnt/ext4-storage/Training/BasicSR/codes/train.py", line 382, in
main()
File "/mnt/ext4-storage/Training/BasicSR/codes/train.py", line 378, in main
fit(model, opt, dataloaders, steps_states, data_params, loggers)
File "/mnt/ext4-storage/Training/BasicSR/codes/train.py", line 221, in fit
model.optimize_parameters(virtual_step) # calculate loss functions, get gradients, update network weights
File "/mnt/ext4-storage/Training/BasicSR/codes/models/ppon_model.py", line 199, in optimize_parameters
l_g_total.backward()
AttributeError: 'float' object has no attribute 'backward'

victorca25 · Answer 1 · Thu Jan 28 2021 20:20:17 GMT+0800 (China Standard Time)

Hello! Can you share your options configuration file?

victorca25 · Answer 2 · Thu Jan 28 2021 21:23:37 GMT+0800 (China Standard Time)

Ah, I didn't see the error.log. So for PPON, you need to configure the losses (type, weights, etc) as you would normally first and then pick which of the losses will be used for which stage. In your case, your configuration should look something like this:

pixel_criterion: l1 
pixel_weight: 1e-2
cx_weight: 0.5
cx_type: contextual
cx_vgg_layers: {conv_3_2: 1, conv_4_2: 1}
ssim_type: ms-ssim
ssim_weight: 1
ms_criterion: multiscale-l1
ms_weight: 1e-2
gan_type: vanilla
gan_weight: 0.005
p1_losses: ['pix']
p2_losses: ['pix-multiscale', 'ms-ssim']
p3_losses: ['contextual']

So you see pixel loss, multiscale pixel loss, multiscale SSIM and contextual loss are configured. Let me know if this fixes the problem.

N0manDemo · Answer 3 · Fri Jan 29 2021 09:44:22 GMT+0800 (China Standard Time)

Thank you, that fixed the problem. I was missing quite a few options from the list.