Training with ReStyle-pSp algorithm results

Question

Training with ReStyle-pSp algorithm results

uselessai opened this issue 2 years ago · comments

Hi, after training for close to 3 weeks, using a GeForce Titan RTX, the results were not satisfactory

I am working with Market-1501 dataset with 39000 images to train. Images size 64X127px

So, I have some questions about how to improve the performance or is posible.

Should I try to train with ReStyle-e4e algorithm or should I keep training for another week?
The problem could be that the number of images in the dataset is not enought?
Or the images has a low resolution?
During the training is posible to get the latent vectors or the result image? During the training the loss is 0.17 but goes to 0.5 during the testing. The idea is to increase the number of images of the dataset, so I am working with the same images during the train and test.

Sorry for asking you so many questions, I am working in my post degree thesis, and this part is the most important of my experimental study.

Thanks!
Laura.

yuval-alaluf · Answer 1 · Fri May 20 2022 00:44:16 GMT+0800 (China Standard Time)

Hi, did you first train a StyleGAN generator on your domain using the Market-1501 dataset?

Laura Álvarez · Answer 2 · Mon May 23 2022 12:26:06 GMT+0800 (China Standard Time)

Yes, I have trained the Stylegan3 model and got the pkl file.
This are some samples from this model

So the pkl model is working fine. After that, I converted the pkl to a .pt file, it seems to be right converted, I got no error, but I have no idea how to test the .pt file. How could I test the .pt file?

And this pt file is the model I have used to train the ReStyle-pSp algorithm.

python ./inversion/scripts/train_restyle_psp.py --dataset_type market_encode --encoder_type ResNetBackboneEncoder --exp_dir experiments/restyle_psp_ffhq_encode_market --batch_size 2 --test_batch_size 2 --workers 8 --test_workers 8 --val_interval 5000 --save_interval 10000 --start_from_latent_avg True --lpips_lambda 0.8 --l2_lambda 1 --id_lambda 0.1 --input_nc 6 --n_iters_per_batch 3 --output_size 64 --stylegan_weights ./network-snapshot-002160Stylegan3.pt

Thanks!

yuval-alaluf · Answer 3 · Mon May 23 2022 16:20:44 GMT+0800 (China Standard Time)

To verify that you were able to convert the pkl to a pt file correctly, you can try generating random images using something like this:

for seed in range(10):
        z = torch.from_numpy(np.random.RandomState(seed).randn(1, G.z_dim)).to(device)
        w = G.mapping(z, None, truncation_psi=truncation_psi)
        img = G.synthesis(w, noise_mode="const")
        img = tensor2im(img)

And then see that img looks like realistic output from your generator.

Regarding the training of the encoder, it looks like you are using the ID loss, but this is designed specifically for faces and should not be used for your domain. One option is to switch the use of the ID loss with the MoCo-based loss by setting --moco_lambda=0.5 and setting id_lambda=0. I would start with making these changes to see if this improves the results.

In general, it could be that your domain (images of full bodies) is very challenging for encoders due to the high diversity of the images. Therefore, there you many to be able to get desirable results when training only an encoder.

Another option is to try PTI and see if that leads to good reconstruction. We provide the code here:
https://github.com/yuval-alaluf/stylegan3-editing/blob/main/inversion/scripts/run_pti_images.py
(note: some changes may be required to run it on non-face images)

Laura Álvarez · Answer 4 · Fri May 27 2022 03:03:16 GMT+0800 (China Standard Time)

Thanks for the quick response.

I tried to generate the images from the .pt model but I am getting an error. I am working with google colab, at first I had converted the pkl to the .pt format and then I generate the random images from both models, with the pkl model works perfectly but when I try with the .pt file I got this error..

w = G.mapping(z, None, truncation_psi=1.0)
ImportError: /root/.cache/torch_extensions/py37_cu113/bias_act_plugin/5a406a2b04aa59c6f0c481df2cacdd5c-tesla-t4/bias_act_plugin.so: cannot open shared object file: No such file or directory

Then, I am not sure how this image should look like but the average image, avg_image.jpg, that is generated automatically when the model is training is this one..

So, I am thinking the problem could be with the generated .pt model?

Also I've been training with MoCo-based loss and after 30k iterations this is the result.

Thanks in advance.
Laura.

yuval-alaluf · Answer 5 · Sat May 28 2022 13:41:01 GMT+0800 (China Standard Time)

There is definitely a problem in the conversion of your generator so I would hold off on training until you are first able to generate images correctly with your model.
How did you try converting your pkl file to the pt file?

Laura Álvarez · Answer 6 · Sun May 29 2022 00:22:17 GMT+0800 (China Standard Time)

This is my code

checkpoint_path = "/content/network-snapshot-002160Stylegan3.pkl"

print(f"Loading StyleGAN3 generator from path: {checkpoint_path}")
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


with open(checkpoint_path, "rb") as f:
	G = pickle.load(f)['G_ema'].to(device)
print('Done!')
   
state_dict = G.state_dict()
torch.save(state_dict, "/content/network-snapshot-002160Stylegan3.pt")

print('Done!')

I do not know if is important but Stylegan3 model was trained with the config (--cfg=stylegan3-r)

This is the google colab file where I am working. I also had checked if the PKL file is correctly generating images.

https://colab.research.google.com/drive/1vPF7zz8Rsc6D8_TBUFDRxsslHagebmr9?usp=sharing

Thanks in advance,
Laura.

Laura Álvarez · Answer 7 · Thu Jun 02 2022 23:58:25 GMT+0800 (China Standard Time)

Hi, I have tried with different NVIDIA PKL models, and I notice that maybe the problem is with the size of the images.

Original stylegan3-r-ffhq-1024x1024.pkl PKL model	Converted PT model

Original stylegan3-r-ffhqu-1024x1024.pkl PKL model	Converted PT model

Original stylegan3-r-metfaces-1024x1024.pkl PKL model	Converted PT model

Original stylegan3-r-metfacesu-1024x1024.pkl PKL model	Converted PT model

The generator works perfectly with models that was trained with images with size 1024x1024px. However when I try to convert the PKL models with a lower resolution 512x512px and 256x256px respectively stylegan3-r-afhqv2-512x512.pkl and stylegan3-r-ffhqu-256x256.pkl, I am getting this error.

generator = SG3Generator(checkpoint_path=model_path).decoder Error line

/content/stylegan3-editing
Loading StyleGAN3 generator from path: /content/network-snapshot-002160Stylegan3.pt

RuntimeError Traceback (most recent call last)
/content/stylegan3-editing/models/stylegan3/model.py in _load_checkpoint(self, checkpoint_path)
60 try:
---> 61 self.decoder.load_state_dict(torch.load(checkpoint_path), strict=True)
62 except:

4 frames
RuntimeError: Error(s) in loading state_dict for Generator:
Missing key(s) in state_dict: "synthesis.L2_52_1024.weight", "synthesis.L2_52_1024.bias", "synthesis.L2_52_1024.magnitude_ema", "synthesis.L2_52_1024.up_filter", "synthesis.L2_52_1024.down_filter", "synthesis.L2_52_1024.affine.weight", "synthesis.L2_52_1024.affine.bias", "synthesis.L4_84_1024.weight", "synthesis.L4_84_1024.bias", "synthesis.L4_84_1024.magnitude_ema", "synthesis.L4_84_1024.up_filter", "synthesis.L4_84_1024.down_filter", "synthesis.L4_84_1024.affine.weight", "synthesis.L4_84_1024.affine.bias", "synthesis.L5_148_1024.weight", "synthesis.L5_148_1024.bias", "synthesis.L5_148_1024.magnitude_ema", "synthesis.L5_148_1024.up_filter", "synthesis.L5_148_1024.down_filter", "synthesis.L5_148_1024.affine.weight", "synthesis.L5_148_1024.affine.bias", "synthesis.L6_148_1024.weight", "synthesis.L6_148_1024.bias", "synthesis.L6_148_1024.magnitude_ema", "synthesis.L6_148_1024.up_filter", "synthesis.L6_148_1024.down_filter", "synthesis.L6_148_1024.affine.weight", "synthesis.L6_148_1024.affine.bias", "synthesis.L7_276_645.weight", "synthesis.L7_276_645.bias", "synthesis.L7_276_645.magnitude_ema", "synthesis.L7_276_645.up_filter", "synthesis.L7_276_645.down_filter", "synthesis.L7_276_645.affine.weight", "synthesis.L7_276_645.affine.bias", "synthesis.L8_276_406.weight", "synthesis.L8_276_406.bias", "synthesis.L8_276_406.magnitude_ema", "synthesis.L8_276_406.up_filter", "synthesis.L8_276_406.down_filter", "synthesis.L8_276_406.affine.weight", "synthesis.L8_276_406.affine.bias", ...
Unexpected key(s) in state_dict: "synthesis.L2_36_1024.weight", "synthesis.L2_36_1024.bias", "synthesis.L2_36_1024.magnitude_ema", "synthesis.L2_36_1024.up_filter", "synthesis.L2_36_1024.down_filter", "synthesis.L2_36_1024.affine.weight", "synthesis.L2_36_1024.affine.bias", "synthesis.L4_52_1024.weight", "synthesis.L4_52_1024.bias", "synthesis.L4_52_1024.magnitude_ema", "synthesis.L4_52_1024.up_filter", "synthesis.L4_52_1024.down_filter", "synthesis.L4_52_1024.affine.weight", "synthesis.L4_52_1024.affine.bias", "synthesis.L5_84_1024.weight", "synthesis.L5_84_1024.bias", "synthesis.L5_84_1024.magnitude_ema", "synthesis.L5_84_1024.up_filter", "synthesis.L5_84_1024.down_filter", "synthesis.L5_84_1024.affine.weight", "synthesis.L5_84_1024.affine.bias", "synthesis.L6_84_1024.weight", "synthesis.L6_84_1024.bias", "synthesis.L6_84_1024.magnitude_ema", "synthesis.L6_84_1024.up_filter", "synthesis.L6_84_1024.down_filter", "synthesis.L6_84_1024.affine.weight", "synthesis.L6_84_1024.affine.bias", "synthesis.L7_148_724.weight", "synthesis.L7_148_724.bias", "synthesis.L7_148_724.magnitude_ema", "synthesis.L7_148_724.up_filter", "synthesis.L7_148_724.down_filter", "synthesis.L7_148_724.affine.weight", "synthesis.L7_148_724.affine.bias", "synthesis.L8_148_512.weight", "synthesis.L8_148_512.bias", "synthesis.L8_148_512.magnitude_ema", "synthesis.L8_148_512.up_filter", "synthesis.L8_148_512.down_filter", "synthesis.L8_148_512.affine.weight", "synthesis.L8_148_512.affine.bias", "synthesis....
size mismatch for synthesis.L3_52_1024.up_filter: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([12]).

During handling of the above exception, another exception occurred:

RuntimeError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
1496 if len(error_msgs) > 0:
1497 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
-> 1498 self.class.name, "\n\t".join(error_msgs)))
1499 return _IncompatibleKeys(missing_keys, unexpected_keys)
1500

RuntimeError: Error(s) in loading state_dict for Generator:
size mismatch for synthesis.L3_52_1024.up_filter: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([12]).

Thanks in advance.

yuval-alaluf · Answer 8 · Thu Jun 02 2022 23:58:56 GMT+0800 (China Standard Time)

Hi,
I just ran your notebook.
There was one correction that is needed. When defining the generator using your pt file, you need to set the res parameter to 512 (the default is 1024)
Specifically, in this part of the code:

image_numbers = 2
save_dir = "/content/imgs"
truncation_psi = 1
model_path = "/content/network-snapshot-002160Stylegan3.pt"
if not os.path.exists(save_dir):
    os.makedirs(save_dir)

generator = SG3Generator(checkpoint_path=model_path, res=512).decoder

When I ran the notebook after this correction, I got the expected results:
The result with the original pkl file (seed 1):

The result with the converted pt file (seed 1):

Note: the script from the original stylegan3 repo starts its seed at 1 while our code starts at 0. Therefore, I needed to change image_numbers = 2 to generate the image corresponding to seed 1.

Laura Álvarez · Answer 9 · Fri Jun 03 2022 00:04:26 GMT+0800 (China Standard Time)

Thanks, I truly appreciate your help. It works!

Training with ReStyle-pSp algorithm results

/content/stylegan3-editing Loading StyleGAN3 generator from path: /content/network-snapshot-002160Stylegan3.pt

/content/stylegan3-editing
Loading StyleGAN3 generator from path: /content/network-snapshot-002160Stylegan3.pt