train.py failed to run

Question

train.py failed to run

1378dm opened this issue 2 years ago · comments

I tried to train images with transparency using train.py from argb branch, but it still fails, here are some error messages, can you help me with it?

Setting up PyTorch plugin "bias_act_plugin"... Done.
Setting up PyTorch plugin "filtered_lrelu_plugin"... Done.

Generator                    Parameters  Buffers  Output shape       Datatype
---                          ---         ---      ---                ---
mapping.fc0                  262656      -        [32, 512]          float32
mapping.fc1                  262656      -        [32, 512]          float32
mapping                      -           512      [32, 16, 512]      float32
synthesis.input.affine       2052        -        [32, 4]            float32
synthesis.input              262144      1545     [32, 512, 36, 36]  float32
synthesis.L0_36_512.affine   262656      -        [32, 512]          float32
synthesis.L0_36_512          2359808     25       [32, 512, 36, 36]  float16
synthesis.L1_36_512.affine   262656      -        [32, 512]          float32
synthesis.L1_36_512          2359808     25       [32, 512, 36, 36]  float16
synthesis.L2_36_512.affine   262656      -        [32, 512]          float32
synthesis.L2_36_512          2359808     25       [32, 512, 36, 36]  float16
synthesis.L3_36_512.affine   262656      -        [32, 512]          float32
synthesis.L3_36_512          2359808     25       [32, 512, 36, 36]  float16
synthesis.L4_52_512.affine   262656      -        [32, 512]          float32
synthesis.L4_52_512          2359808     37       [32, 512, 52, 52]  float16
synthesis.L5_52_512.affine   262656      -        [32, 512]          float32
synthesis.L5_52_512          2359808     25       [32, 512, 52, 52]  float16
synthesis.L6_52_512.affine   262656      -        [32, 512]          float32
synthesis.L6_52_512          2359808     25       [32, 512, 52, 52]  float16
synthesis.L7_52_512.affine   262656      -        [32, 512]          float32
synthesis.L7_52_512          2359808     25       [32, 512, 52, 52]  float16
synthesis.L8_84_512.affine   262656      -        [32, 512]          float32
synthesis.L8_84_512          2359808     37       [32, 512, 84, 84]  float16
synthesis.L9_84_512.affine   262656      -        [32, 512]          float32
synthesis.L9_84_512          2359808     25       [32, 512, 84, 84]  float16
synthesis.L10_84_512.affine  262656      -        [32, 512]          float32
synthesis.L10_84_512         2359808     25       [32, 512, 84, 84]  float16
synthesis.L11_84_512.affine  262656      -        [32, 512]          float32
synthesis.L11_84_512         2359808     25       [32, 512, 84, 84]  float16
synthesis.L12_84_512.affine  262656      -        [32, 512]          float32
synthesis.L12_84_512         2359808     25       [32, 512, 84, 84]  float16
synthesis.L13_64_512.affine  262656      -        [32, 512]          float32
synthesis.L13_64_512         2359808     25       [32, 512, 64, 64]  float16
synthesis.L14_64_4.affine    262656      -        [32, 512]          float32
synthesis.L14_64_4           2052        1        [32, 4, 64, 64]    float16
synthesis                    -           -        [32, 4, 64, 64]    float32
---                          ---         ---      ---                ---
Total                        37768712    2432     -                  -

Setting up PyTorch plugin "upfirdn2d_plugin"... Done.

Discriminator  Parameters  Buffers  Output shape       Datatype
---            ---         ---      ---                ---
b64.fromrgb    2560        16       [32, 512, 64, 64]  float16
b64.skip       262144      16       [32, 512, 32, 32]  float16
b64.conv0      2359808     16       [32, 512, 64, 64]  float16
b64.conv1      2359808     16       [32, 512, 32, 32]  float16
b64            -           16       [32, 512, 32, 32]  float16
b32.skip       262144      16       [32, 512, 16, 16]  float16
b32.conv0      2359808     16       [32, 512, 32, 32]  float16
b32.conv1      2359808     16       [32, 512, 16, 16]  float16
b32            -           16       [32, 512, 16, 16]  float16
b16.skip       262144      16       [32, 512, 8, 8]    float16
b16.conv0      2359808     16       [32, 512, 16, 16]  float16
b16.conv1      2359808     16       [32, 512, 8, 8]    float16
b16            -           16       [32, 512, 8, 8]    float16
b8.skip        262144      16       [32, 512, 4, 4]    float16
b8.conv0       2359808     16       [32, 512, 8, 8]    float16
b8.conv1       2359808     16       [32, 512, 4, 4]    float16
b8             -           16       [32, 512, 4, 4]    float16
b4.mbstd       -           -        [32, 513, 4, 4]    float32
b4.conv        2364416     16       [32, 512, 4, 4]    float32
b4.fc          4194816     -        [32, 512]          float32
b4.out         513         -        [32, 1]            float32
---            ---         ---      ---                ---
Total          26489345    288      -                  -

Setting up augmentation...
Distributing across 1 GPUs...
Setting up training phases...
Exporting sample images...
Initializing logs...
Skipping tfevents export: No module named 'tensorboard'
Training for 10000 kimg...

C:\Users\John\Desktop\New\stylegan3-fun-rgba\training\augment.py:231: UserWarning: Specified kernel cache directory could not be created! This disables kernel caching. Specified directory is C:\Users\John\AppData\Local\Temp/torch/kernels. This warning will appear only once per process. (Triggered internally at  C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\jit_utils.cpp:860.)
  s = torch.exp2(torch.randn([batch_size], device=device) * self.scale_std)
Traceback (most recent call last):
  File "train.py", line 330, in <module>
    main()  # pylint: disable=no-value-for-parameter
  File "C:\Users\John\AppData\Local\Programs\Python\Python38\lib\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\John\AppData\Local\Programs\Python\Python38\lib\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "C:\Users\John\AppData\Local\Programs\Python\Python38\lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\John\AppData\Local\Programs\Python\Python38\lib\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "train.py", line 323, in main
    launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
  File "train.py", line 92, in launch_training
    subprocess_fn(rank=0, c=c, temp_dir=temp_dir)
  File "train.py", line 50, in subprocess_fn
    training_loop.training_loop(rank=rank, **c)
  File "C:\Users\John\Desktop\New\stylegan3-fun-rgba\training\training_loop.py", line 279, in training_loop
    loss.accumulate_gradients(phase=phase.name, real_img=real_img, real_c=real_c, gen_z=gen_z, gen_c=gen_c, gain=phase.interval, cur_nimg=cur_nimg)
  File "C:\Users\John\Desktop\New\stylegan3-fun-rgba\training\loss.py", line 75, in accumulate_gradients
    gen_logits = self.run_D(gen_img, gen_c, blur_sigma=blur_sigma)
  File "C:\Users\John\Desktop\New\stylegan3-fun-rgba\training\loss.py", line 59, in run_D
    img = self.augment_pipe(img)
  File "C:\Users\John\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\John\Desktop\New\stylegan3-fun-rgba\training\augment.py", line 370, in forward
    raise ValueError('Image must be RGB (3 channels) or L (1 channel)')
ValueError: Image must be RGB (3 channels) or L (1 channel)```

Diego Porres · Answer 1 · Wed May 11 2022 17:16:49 GMT+0800 (China Standard Time)

You are not using the rgba branch. How did you clone the repository? The easiest way is to use git clone https://github.com/PDillis/stylegan3-fun.git -b rgba. If you don't specify the branch, you will use the main branch and I haven't moved the changes there (yet).

1378dm · Answer 2 · Thu May 12 2022 09:56:22 GMT+0800 (China Standard Time)

I'm sure I've cloned the RGBA branch. The error seems to appear in augment.py, but you don't seem to have modified it.

Diego Porres · Answer 3 · Thu May 12 2022 16:44:52 GMT+0800 (China Standard Time)

You're completely right, thank you for pointing it out, sorry for not reading your error more carefully. I forgot to change augment.py because I usually only use --augpipe=bg and it didn't occur to me that the color augmentation will of course fail. If you can show me the complete command you use to train just in case I miss something, that would be extremely helpful. Will take a look and update the code as soon as possible.

1378dm · Answer 4 · Thu May 12 2022 17:45:18 GMT+0800 (China Standard Time)

I am not using --augpipe in my config, below is the command I am running.
python train.py --cfg=stylegan3-t --data=datasets/test_1.zip --gpus=1 --batch=32 --gamma=6.6 --kimg=10000 --snap=50 --img-snap=10 --snap-res=1080p --workers=1 --batch-gpu=32

Diego Porres · Answer 5 · Thu May 12 2022 19:15:08 GMT+0800 (China Standard Time)

Ok thanks. By default, you will use --aug=ada and --augpipge=bgc, so the color augmentations need to be done correctly with RGBA data. It may be simpler than I think, but will need a bit of time to check everything is right. If you want to train something in the meantime without the color augmentations, set --augpipe=bg and it should train (I am currently training a --cfg=stylegan3-r with only blit and geometric augmentations and everything is going ok). Will update here once the corrections are done for the color augmentations.

1378dm · Answer 6 · Sat May 14 2022 10:08:31 GMT+0800 (China Standard Time)

By the way, how should the rgba model be used? Is it possible to use gen_images.py directly?

Diego Porres · Answer 7 · Mon May 16 2022 05:28:58 GMT+0800 (China Standard Time)

I haven't touched gen_images.py, but generate.py works so far, even saving videos. However, .mp4 videos (which is what I mostly use) do not support RGBA format, so I need to think a bit on how to do this. In turn, I only save the RGB part for the video, but I imagine generating images is more important for you. Run python generate.py images --help for knowing all the options available.

Diego Porres · Answer 8 · Fri May 20 2022 19:38:33 GMT+0800 (China Standard Time)

See if 3b107d4 fixes the augmentations issue you were having.