The loss value very huge

Question

The loss value very huge

ss24cs opened this issue 4 years ago · comments

Hi, buddy, your demo is wonderful. Here is my training question of train.py demo:
I used a CelebA face dataset training, and I got loss value:
[CelebALandmarkVal] PSNR: 9.29 SSIM: 0.0175 Loss: 1433107.623884 Best PSNR: 9.40 in Step: [10000]
===> Saving last checkpoint to [../experiments/DIC_in3f48_x8_myself/epochs/step_0014000_ckp.pth] ...]
Fri Sep 25 11:16:13 2020 <epoch: 1, iter: 14,250, lr:1.250e-05> loss_g_pix: 4.3555e+06 loss_g_align: 9.2217e+18 loss_g_feature: 4.9574e+06 loss_g_GAN: 5.4255e+08 loss_g: 9.2217e+17 loss_d: 0.0000e+00 loss_total: 9.2217e+17 pred_d_real: 3.3765e+01 pred_d_fake: -5.4255e+08
Fri Sep 25 11:18:13 2020 <epoch: 1, iter: 14,500, lr:1.250e-05> loss_g_pix: 4.7608e+06 loss_g_align: 1.0164e+19 loss_g_feature: 5.4015e+06 loss_g_GAN: 5.9279e+08 loss_g: 1.0164e+18 loss_d: 0.0000e+00 loss_total: 1.0164e+18 pred_d_real: 3.4847e+01 pred_d_fake: -5.9279e+08
Fri Sep 25 11:20:13 2020 <epoch: 1, iter: 14,750, lr:1.250e-05> loss_g_pix: 4.3234e+06 loss_g_align: 2.2932e+19 loss_g_feature: 4.8757e+06 loss_g_GAN: 5.3533e+08 loss_g: 2.2932e+18 loss_d: 0.0000e+00 loss_total: 2.2932e+18 pred_d_real: 2.9394e+01 pred_d_fake: -5.3533e+08
Something wrong in my dataset? Or train.json file? Hope you let me know what's the detail. Thank you.

ss24cs · Answer 1 · Tue Sep 29 2020 15:23:22 GMT+0800 (China Standard Time)

I used a wrong configfile? And how to training my own dataset?

Zhenyu Jiang · Answer 2 · Sat Oct 03 2020 11:43:14 GMT+0800 (China Standard Time)

Hi, what dataset you are using? Can you share your train.json with me? I can't tell what's wrong from only these logs.

xiaoxia-cai · Answer 3 · Tue Oct 13 2020 18:34:49 GMT+0800 (China Standard Time)

Hi, buddy, your demo is wonderful. Here is my training question of train.py demo:
I used a CelebA face dataset training, and I got loss value:
[CelebALandmarkVal] PSNR: 9.29 SSIM: 0.0175 Loss: 1433107.623884 Best PSNR: 9.40 in Step: [10000]
===> Saving last checkpoint to [../experiments/DIC_in3f48_x8_myself/epochs/step_0014000_ckp.pth] ...]
Fri Sep 25 11:16:13 2020 <epoch: 1, iter: 14,250, lr:1.250e-05> loss_g_pix: 4.3555e+06 loss_g_align: 9.2217e+18 loss_g_feature: 4.9574e+06 loss_g_GAN: 5.4255e+08 loss_g: 9.2217e+17 loss_d: 0.0000e+00 loss_total: 9.2217e+17 pred_d_real: 3.3765e+01 pred_d_fake: -5.4255e+08
Fri Sep 25 11:18:13 2020 <epoch: 1, iter: 14,500, lr:1.250e-05> loss_g_pix: 4.7608e+06 loss_g_align: 1.0164e+19 loss_g_feature: 5.4015e+06 loss_g_GAN: 5.9279e+08 loss_g: 1.0164e+18 loss_d: 0.0000e+00 loss_total: 1.0164e+18 pred_d_real: 3.4847e+01 pred_d_fake: -5.9279e+08
Fri Sep 25 11:20:13 2020 <epoch: 1, iter: 14,750, lr:1.250e-05> loss_g_pix: 4.3234e+06 loss_g_align: 2.2932e+19 loss_g_feature: 4.8757e+06 loss_g_GAN: 5.3533e+08 loss_g: 2.2932e+18 loss_d: 0.0000e+00 loss_total: 2.2932e+18 pred_d_real: 2.9394e+01 pred_d_fake: -5.3533e+08
Something wrong in my dataset? Or train.json file? Hope you let me know what's the detail. Thank you.

the same issue +1

JingzheLyp · Answer 4 · Wed Oct 28 2020 10:48:47 GMT+0800 (China Standard Time)

hi, have you resolved this problem? @ss24cs

Zhenyu Jiang · Answer 5 · Thu Oct 29 2020 14:42:45 GMT+0800 (China Standard Time)

Hi guys, the loss explosion didn't happen in my case. I use the dataset and option file provided in this repo.