train.py failed to run
1378dm opened this issue · comments
I tried to train images with transparency using train.py from argb branch, but it still fails, here are some error messages, can you help me with it?
Setting up PyTorch plugin "bias_act_plugin"... Done.
Setting up PyTorch plugin "filtered_lrelu_plugin"... Done.
Generator Parameters Buffers Output shape Datatype
--- --- --- --- ---
mapping.fc0 262656 - [32, 512] float32
mapping.fc1 262656 - [32, 512] float32
mapping - 512 [32, 16, 512] float32
synthesis.input.affine 2052 - [32, 4] float32
synthesis.input 262144 1545 [32, 512, 36, 36] float32
synthesis.L0_36_512.affine 262656 - [32, 512] float32
synthesis.L0_36_512 2359808 25 [32, 512, 36, 36] float16
synthesis.L1_36_512.affine 262656 - [32, 512] float32
synthesis.L1_36_512 2359808 25 [32, 512, 36, 36] float16
synthesis.L2_36_512.affine 262656 - [32, 512] float32
synthesis.L2_36_512 2359808 25 [32, 512, 36, 36] float16
synthesis.L3_36_512.affine 262656 - [32, 512] float32
synthesis.L3_36_512 2359808 25 [32, 512, 36, 36] float16
synthesis.L4_52_512.affine 262656 - [32, 512] float32
synthesis.L4_52_512 2359808 37 [32, 512, 52, 52] float16
synthesis.L5_52_512.affine 262656 - [32, 512] float32
synthesis.L5_52_512 2359808 25 [32, 512, 52, 52] float16
synthesis.L6_52_512.affine 262656 - [32, 512] float32
synthesis.L6_52_512 2359808 25 [32, 512, 52, 52] float16
synthesis.L7_52_512.affine 262656 - [32, 512] float32
synthesis.L7_52_512 2359808 25 [32, 512, 52, 52] float16
synthesis.L8_84_512.affine 262656 - [32, 512] float32
synthesis.L8_84_512 2359808 37 [32, 512, 84, 84] float16
synthesis.L9_84_512.affine 262656 - [32, 512] float32
synthesis.L9_84_512 2359808 25 [32, 512, 84, 84] float16
synthesis.L10_84_512.affine 262656 - [32, 512] float32
synthesis.L10_84_512 2359808 25 [32, 512, 84, 84] float16
synthesis.L11_84_512.affine 262656 - [32, 512] float32
synthesis.L11_84_512 2359808 25 [32, 512, 84, 84] float16
synthesis.L12_84_512.affine 262656 - [32, 512] float32
synthesis.L12_84_512 2359808 25 [32, 512, 84, 84] float16
synthesis.L13_64_512.affine 262656 - [32, 512] float32
synthesis.L13_64_512 2359808 25 [32, 512, 64, 64] float16
synthesis.L14_64_4.affine 262656 - [32, 512] float32
synthesis.L14_64_4 2052 1 [32, 4, 64, 64] float16
synthesis - - [32, 4, 64, 64] float32
--- --- --- --- ---
Total 37768712 2432 - -
Setting up PyTorch plugin "upfirdn2d_plugin"... Done.
Discriminator Parameters Buffers Output shape Datatype
--- --- --- --- ---
b64.fromrgb 2560 16 [32, 512, 64, 64] float16
b64.skip 262144 16 [32, 512, 32, 32] float16
b64.conv0 2359808 16 [32, 512, 64, 64] float16
b64.conv1 2359808 16 [32, 512, 32, 32] float16
b64 - 16 [32, 512, 32, 32] float16
b32.skip 262144 16 [32, 512, 16, 16] float16
b32.conv0 2359808 16 [32, 512, 32, 32] float16
b32.conv1 2359808 16 [32, 512, 16, 16] float16
b32 - 16 [32, 512, 16, 16] float16
b16.skip 262144 16 [32, 512, 8, 8] float16
b16.conv0 2359808 16 [32, 512, 16, 16] float16
b16.conv1 2359808 16 [32, 512, 8, 8] float16
b16 - 16 [32, 512, 8, 8] float16
b8.skip 262144 16 [32, 512, 4, 4] float16
b8.conv0 2359808 16 [32, 512, 8, 8] float16
b8.conv1 2359808 16 [32, 512, 4, 4] float16
b8 - 16 [32, 512, 4, 4] float16
b4.mbstd - - [32, 513, 4, 4] float32
b4.conv 2364416 16 [32, 512, 4, 4] float32
b4.fc 4194816 - [32, 512] float32
b4.out 513 - [32, 1] float32
--- --- --- --- ---
Total 26489345 288 - -
Setting up augmentation...
Distributing across 1 GPUs...
Setting up training phases...
Exporting sample images...
Initializing logs...
Skipping tfevents export: No module named 'tensorboard'
Training for 10000 kimg...
C:\Users\John\Desktop\New\stylegan3-fun-rgba\training\augment.py:231: UserWarning: Specified kernel cache directory could not be created! This disables kernel caching. Specified directory is C:\Users\John\AppData\Local\Temp/torch/kernels. This warning will appear only once per process. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\jit_utils.cpp:860.)
s = torch.exp2(torch.randn([batch_size], device=device) * self.scale_std)
Traceback (most recent call last):
File "train.py", line 330, in <module>
main() # pylint: disable=no-value-for-parameter
File "C:\Users\John\AppData\Local\Programs\Python\Python38\lib\site-packages\click\core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "C:\Users\John\AppData\Local\Programs\Python\Python38\lib\site-packages\click\core.py", line 1055, in main
rv = self.invoke(ctx)
File "C:\Users\John\AppData\Local\Programs\Python\Python38\lib\site-packages\click\core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\John\AppData\Local\Programs\Python\Python38\lib\site-packages\click\core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "train.py", line 323, in main
launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
File "train.py", line 92, in launch_training
subprocess_fn(rank=0, c=c, temp_dir=temp_dir)
File "train.py", line 50, in subprocess_fn
training_loop.training_loop(rank=rank, **c)
File "C:\Users\John\Desktop\New\stylegan3-fun-rgba\training\training_loop.py", line 279, in training_loop
loss.accumulate_gradients(phase=phase.name, real_img=real_img, real_c=real_c, gen_z=gen_z, gen_c=gen_c, gain=phase.interval, cur_nimg=cur_nimg)
File "C:\Users\John\Desktop\New\stylegan3-fun-rgba\training\loss.py", line 75, in accumulate_gradients
gen_logits = self.run_D(gen_img, gen_c, blur_sigma=blur_sigma)
File "C:\Users\John\Desktop\New\stylegan3-fun-rgba\training\loss.py", line 59, in run_D
img = self.augment_pipe(img)
File "C:\Users\John\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\John\Desktop\New\stylegan3-fun-rgba\training\augment.py", line 370, in forward
raise ValueError('Image must be RGB (3 channels) or L (1 channel)')
ValueError: Image must be RGB (3 channels) or L (1 channel)```
You are not using the rgba
branch. How did you clone the repository? The easiest way is to use git clone https://github.com/PDillis/stylegan3-fun.git -b rgba
. If you don't specify the branch, you will use the main
branch and I haven't moved the changes there (yet).
I'm sure I've cloned the RGBA branch. The error seems to appear in augment.py
, but you don't seem to have modified it.
You're completely right, thank you for pointing it out, sorry for not reading your error more carefully. I forgot to change augment.py
because I usually only use --augpipe=bg
and it didn't occur to me that the color augmentation will of course fail. If you can show me the complete command you use to train just in case I miss something, that would be extremely helpful. Will take a look and update the code as soon as possible.
I am not using --augpipe
in my config, below is the command I am running.
python train.py --cfg=stylegan3-t --data=datasets/test_1.zip --gpus=1 --batch=32 --gamma=6.6 --kimg=10000 --snap=50 --img-snap=10 --snap-res=1080p --workers=1 --batch-gpu=32
Ok thanks. By default, you will use --aug=ada
and --augpipge=bgc
, so the color augmentations need to be done correctly with RGBA data. It may be simpler than I think, but will need a bit of time to check everything is right. If you want to train something in the meantime without the color augmentations, set --augpipe=bg
and it should train (I am currently training a --cfg=stylegan3-r
with only blit and geometric augmentations and everything is going ok). Will update here once the corrections are done for the color augmentations.
By the way, how should the rgba model be used? Is it possible to use gen_images.py
directly?
I haven't touched gen_images.py
, but generate.py
works so far, even saving videos. However, .mp4
videos (which is what I mostly use) do not support RGBA format, so I need to think a bit on how to do this. In turn, I only save the RGB part for the video, but I imagine generating images is more important for you. Run python generate.py images --help
for knowing all the options available.
See if 3b107d4 fixes the augmentations issue you were having.