yuval-alaluf / stylegan3-editing

Official Implementation of "Third Time's the Charm? Image and Video Editing with StyleGAN3" (AIM ECCVW 2022) https://arxiv.org/abs/2201.13433

Home Page:https://yuval-alaluf.github.io/stylegan3-editing/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training with ReStyle-pSp algorithm results

uselessai opened this issue · comments

Hi, after training for close to 3 weeks, using a GeForce Titan RTX, the results were not satisfactory

resultadosytlegan250000

I am working with Market-1501 dataset with 39000 images to train. Images size 64X127px

So, I have some questions about how to improve the performance or is posible.

Should I try to train with ReStyle-e4e algorithm or should I keep training for another week?
The problem could be that the number of images in the dataset is not enought?
Or the images has a low resolution?
During the training is posible to get the latent vectors or the result image? During the training the loss is 0.17 but goes to 0.5 during the testing. The idea is to increase the number of images of the dataset, so I am working with the same images during the train and test.

Sorry for asking you so many questions, I am working in my post degree thesis, and this part is the most important of my experimental study.

Thanks!
Laura.

Hi, did you first train a StyleGAN generator on your domain using the Market-1501 dataset?

Yes, I have trained the Stylegan3 model and got the pkl file.
This are some samples from this model

ImagenesFakeStylegan3002160

So the pkl model is working fine. After that, I converted the pkl to a .pt file, it seems to be right converted, I got no error, but I have no idea how to test the .pt file. How could I test the .pt file?

And this pt file is the model I have used to train the ReStyle-pSp algorithm.

python ./inversion/scripts/train_restyle_psp.py --dataset_type market_encode --encoder_type ResNetBackboneEncoder --exp_dir experiments/restyle_psp_ffhq_encode_market --batch_size 2 --test_batch_size 2 --workers 8 --test_workers 8 --val_interval 5000 --save_interval 10000 --start_from_latent_avg True --lpips_lambda 0.8 --l2_lambda 1 --id_lambda 0.1 --input_nc 6 --n_iters_per_batch 3 --output_size 64 --stylegan_weights ./network-snapshot-002160Stylegan3.pt

Thanks!

To verify that you were able to convert the pkl to a pt file correctly, you can try generating random images using something like this:

for seed in range(10):
        z = torch.from_numpy(np.random.RandomState(seed).randn(1, G.z_dim)).to(device)
        w = G.mapping(z, None, truncation_psi=truncation_psi)
        img = G.synthesis(w, noise_mode="const")
        img = tensor2im(img)

And then see that img looks like realistic output from your generator.

Regarding the training of the encoder, it looks like you are using the ID loss, but this is designed specifically for faces and should not be used for your domain. One option is to switch the use of the ID loss with the MoCo-based loss by setting --moco_lambda=0.5 and setting id_lambda=0. I would start with making these changes to see if this improves the results.

In general, it could be that your domain (images of full bodies) is very challenging for encoders due to the high diversity of the images. Therefore, there you many to be able to get desirable results when training only an encoder.

Another option is to try PTI and see if that leads to good reconstruction. We provide the code here:
https://github.com/yuval-alaluf/stylegan3-editing/blob/main/inversion/scripts/run_pti_images.py
(note: some changes may be required to run it on non-face images)

Thanks for the quick response.

I tried to generate the images from the .pt model but I am getting an error. I am working with google colab, at first I had converted the pkl to the .pt format and then I generate the random images from both models, with the pkl model works perfectly but when I try with the .pt file I got this error..

w = G.mapping(z, None, truncation_psi=1.0)
ImportError: /root/.cache/torch_extensions/py37_cu113/bias_act_plugin/5a406a2b04aa59c6f0c481df2cacdd5c-tesla-t4/bias_act_plugin.so: cannot open shared object file: No such file or directory

Then, I am not sure how this image should look like but the average image, avg_image.jpg, that is generated automatically when the model is training is this one..

2022_05_25Average_image

So, I am thinking the problem could be with the generated .pt model?

Also I've been training with MoCo-based loss and after 30k iterations this is the result.

2022_05_26_30000

Thanks in advance.
Laura.

There is definitely a problem in the conversion of your generator so I would hold off on training until you are first able to generate images correctly with your model.
How did you try converting your pkl file to the pt file?

This is my code

checkpoint_path = "/content/network-snapshot-002160Stylegan3.pkl"

print(f"Loading StyleGAN3 generator from path: {checkpoint_path}")
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


with open(checkpoint_path, "rb") as f:
	G = pickle.load(f)['G_ema'].to(device)
print('Done!')
   
state_dict = G.state_dict()
torch.save(state_dict, "/content/network-snapshot-002160Stylegan3.pt")

print('Done!') 

I do not know if is important but Stylegan3 model was trained with the config (--cfg=stylegan3-r)

This is the google colab file where I am working. I also had checked if the PKL file is correctly generating images.

https://colab.research.google.com/drive/1vPF7zz8Rsc6D8_TBUFDRxsslHagebmr9?usp=sharing

Thanks in advance,
Laura.

Hi, I have tried with different NVIDIA PKL models, and I notice that maybe the problem is with the size of the images.

Original stylegan3-r-ffhq-1024x1024.pkl PKL model Converted PT model
stylegan3-r-ffhq-1024x1024_PKL stylegan3-r-ffhq-1024x1024_PT
Original stylegan3-r-ffhqu-1024x1024.pkl PKL model Converted PT model
stylegan3-r-ffhqu-1024x1024_PKL stylegan3-r-ffhqu-1024x1024_PT
Original stylegan3-r-metfaces-1024x1024.pkl PKL model Converted PT model
stylegan3-r-metfaces-1024x1024_PKL stylegan3-r-metfaces-1024x1024_PT
Original stylegan3-r-metfacesu-1024x1024.pkl PKL model Converted PT model
stylegan3-r-metfacesu-1024x1024_PKL stylegan3-r-metfacesu-1024x1024_PT

The generator works perfectly with models that was trained with images with size 1024x1024px. However when I try to convert the PKL models with a lower resolution 512x512px and 256x256px respectively stylegan3-r-afhqv2-512x512.pkl and stylegan3-r-ffhqu-256x256.pkl, I am getting this error.

generator = SG3Generator(checkpoint_path=model_path).decoder Error line

/content/stylegan3-editing
Loading StyleGAN3 generator from path: /content/network-snapshot-002160Stylegan3.pt

RuntimeError Traceback (most recent call last)
/content/stylegan3-editing/models/stylegan3/model.py in _load_checkpoint(self, checkpoint_path)
60 try:
---> 61 self.decoder.load_state_dict(torch.load(checkpoint_path), strict=True)
62 except:

4 frames
RuntimeError: Error(s) in loading state_dict for Generator:
Missing key(s) in state_dict: "synthesis.L2_52_1024.weight", "synthesis.L2_52_1024.bias", "synthesis.L2_52_1024.magnitude_ema", "synthesis.L2_52_1024.up_filter", "synthesis.L2_52_1024.down_filter", "synthesis.L2_52_1024.affine.weight", "synthesis.L2_52_1024.affine.bias", "synthesis.L4_84_1024.weight", "synthesis.L4_84_1024.bias", "synthesis.L4_84_1024.magnitude_ema", "synthesis.L4_84_1024.up_filter", "synthesis.L4_84_1024.down_filter", "synthesis.L4_84_1024.affine.weight", "synthesis.L4_84_1024.affine.bias", "synthesis.L5_148_1024.weight", "synthesis.L5_148_1024.bias", "synthesis.L5_148_1024.magnitude_ema", "synthesis.L5_148_1024.up_filter", "synthesis.L5_148_1024.down_filter", "synthesis.L5_148_1024.affine.weight", "synthesis.L5_148_1024.affine.bias", "synthesis.L6_148_1024.weight", "synthesis.L6_148_1024.bias", "synthesis.L6_148_1024.magnitude_ema", "synthesis.L6_148_1024.up_filter", "synthesis.L6_148_1024.down_filter", "synthesis.L6_148_1024.affine.weight", "synthesis.L6_148_1024.affine.bias", "synthesis.L7_276_645.weight", "synthesis.L7_276_645.bias", "synthesis.L7_276_645.magnitude_ema", "synthesis.L7_276_645.up_filter", "synthesis.L7_276_645.down_filter", "synthesis.L7_276_645.affine.weight", "synthesis.L7_276_645.affine.bias", "synthesis.L8_276_406.weight", "synthesis.L8_276_406.bias", "synthesis.L8_276_406.magnitude_ema", "synthesis.L8_276_406.up_filter", "synthesis.L8_276_406.down_filter", "synthesis.L8_276_406.affine.weight", "synthesis.L8_276_406.affine.bias", ...
Unexpected key(s) in state_dict: "synthesis.L2_36_1024.weight", "synthesis.L2_36_1024.bias", "synthesis.L2_36_1024.magnitude_ema", "synthesis.L2_36_1024.up_filter", "synthesis.L2_36_1024.down_filter", "synthesis.L2_36_1024.affine.weight", "synthesis.L2_36_1024.affine.bias", "synthesis.L4_52_1024.weight", "synthesis.L4_52_1024.bias", "synthesis.L4_52_1024.magnitude_ema", "synthesis.L4_52_1024.up_filter", "synthesis.L4_52_1024.down_filter", "synthesis.L4_52_1024.affine.weight", "synthesis.L4_52_1024.affine.bias", "synthesis.L5_84_1024.weight", "synthesis.L5_84_1024.bias", "synthesis.L5_84_1024.magnitude_ema", "synthesis.L5_84_1024.up_filter", "synthesis.L5_84_1024.down_filter", "synthesis.L5_84_1024.affine.weight", "synthesis.L5_84_1024.affine.bias", "synthesis.L6_84_1024.weight", "synthesis.L6_84_1024.bias", "synthesis.L6_84_1024.magnitude_ema", "synthesis.L6_84_1024.up_filter", "synthesis.L6_84_1024.down_filter", "synthesis.L6_84_1024.affine.weight", "synthesis.L6_84_1024.affine.bias", "synthesis.L7_148_724.weight", "synthesis.L7_148_724.bias", "synthesis.L7_148_724.magnitude_ema", "synthesis.L7_148_724.up_filter", "synthesis.L7_148_724.down_filter", "synthesis.L7_148_724.affine.weight", "synthesis.L7_148_724.affine.bias", "synthesis.L8_148_512.weight", "synthesis.L8_148_512.bias", "synthesis.L8_148_512.magnitude_ema", "synthesis.L8_148_512.up_filter", "synthesis.L8_148_512.down_filter", "synthesis.L8_148_512.affine.weight", "synthesis.L8_148_512.affine.bias", "synthesis....
size mismatch for synthesis.L3_52_1024.up_filter: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([12]).

During handling of the above exception, another exception occurred:

RuntimeError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
1496 if len(error_msgs) > 0:
1497 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
-> 1498 self.class.name, "\n\t".join(error_msgs)))
1499 return _IncompatibleKeys(missing_keys, unexpected_keys)
1500

RuntimeError: Error(s) in loading state_dict for Generator:
size mismatch for synthesis.L3_52_1024.up_filter: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([12]).

Thanks in advance.

Hi,
I just ran your notebook.
There was one correction that is needed. When defining the generator using your pt file, you need to set the res parameter to 512 (the default is 1024)
Specifically, in this part of the code:

image_numbers = 2
save_dir = "/content/imgs"
truncation_psi = 1
model_path = "/content/network-snapshot-002160Stylegan3.pt"
if not os.path.exists(save_dir):
    os.makedirs(save_dir)

generator = SG3Generator(checkpoint_path=model_path, res=512).decoder

When I ran the notebook after this correction, I got the expected results:
The result with the original pkl file (seed 1):
seed0001

The result with the converted pt file (seed 1):
seed_pt_0001

Note: the script from the original stylegan3 repo starts its seed at 1 while our code starts at 0. Therefore, I needed to change image_numbers = 2 to generate the image corresponding to seed 1.

Thanks, I truly appreciate your help. It works!