victorca25 / traiNNer

traiNNer: Deep learning framework for image and video super-resolution, restoration and image-to-image translation, for training and testing.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PPON Error when moving to Phase 2

N0manDemo opened this issue · comments

I was training a model with PPON (192) + MultiScale + Diffaug, and I receive the following error when moving to Phase 2:
I have AMP disabled because my GPU doesn't support it.
error.log

21-01-27 11:26:52.449 - INFO: Random seed: 0
21-01-27 11:26:52.647 - INFO: Dataset [LRHRDataset - DIV2K] is created.
21-01-27 11:26:52.647 - INFO: Number of train images: 37,933, iters: 2,371
21-01-27 11:26:52.647 - INFO: Total epochs needed: 43 for iters 100,000
21-01-27 11:26:52.648 - INFO: Dataset [LRHRDataset - val_set14_part] is created.
21-01-27 11:26:52.648 - INFO: Number of val images in [val_set14_part]: 1
21-01-27 11:26:52.650 - INFO: AMP library available
21-01-27 11:26:52.827 - INFO: Initialization method [kaiming]
21-01-27 11:26:54.127 - INFO: Initialization method [kaiming]
21-01-27 11:26:54.185 - INFO: Loading pretrained model for G [../experiments/pretrained_models/PPON_G.pth] ...
21-01-27 11:26:55.276 - INFO: Network G structure: DataParallel - PPON, with parameters: 17,267,657
21-01-27 11:26:55.277 - INFO: Network D structure: DataParallel - MultiscaleDiscriminator, with parameters: 8,296,899
21-01-27 11:26:55.277 - INFO: Model [PPONModel] is created.
21-01-27 11:26:55.277 - INFO: Start training from epoch: 0, iter: 0
21-01-27 11:26:55.991 - INFO: Switching to phase: p2, step: 1
Traceback (most recent call last):
File "/mnt/ext4-storage/Training/BasicSR/codes/train.py", line 382, in
main()
File "/mnt/ext4-storage/Training/BasicSR/codes/train.py", line 378, in main
fit(model, opt, dataloaders, steps_states, data_params, loggers)
File "/mnt/ext4-storage/Training/BasicSR/codes/train.py", line 221, in fit
model.optimize_parameters(virtual_step) # calculate loss functions, get gradients, update network weights
File "/mnt/ext4-storage/Training/BasicSR/codes/models/ppon_model.py", line 199, in optimize_parameters
l_g_total.backward()
AttributeError: 'float' object has no attribute 'backward'

Hello! Can you share your options configuration file?

Ah, I didn't see the error.log. So for PPON, you need to configure the losses (type, weights, etc) as you would normally first and then pick which of the losses will be used for which stage. In your case, your configuration should look something like this:

pixel_criterion: l1 
pixel_weight: 1e-2
cx_weight: 0.5
cx_type: contextual
cx_vgg_layers: {conv_3_2: 1, conv_4_2: 1}
ssim_type: ms-ssim
ssim_weight: 1
ms_criterion: multiscale-l1
ms_weight: 1e-2
gan_type: vanilla
gan_weight: 0.005
p1_losses: ['pix']
p2_losses: ['pix-multiscale', 'ms-ssim']
p3_losses: ['contextual']

So you see pixel loss, multiscale pixel loss, multiscale SSIM and contextual loss are configured. Let me know if this fixes the problem.

Thank you, that fixed the problem. I was missing quite a few options from the list.