Question about encoding image

Question

Question about encoding image

Eric07110904 opened this issue 2 years ago · comments

Thanks for your awesome works, I have a question about GAN inversion.
I used the psp to do GAN inversion in anime domain(512x512 300k images), and used pre-trained anime StyleGAN2(512x512 ).

After training 100,000 iteration with batch_size=4, I observed two problems.

detaied structure of anime face(it seems that my model didn't capture the part of mouth、wink)
output is blured

Do you have any suggentstion about solving two problems?
I am wondering if any parameter is set wrong or should i do more iterations or add w_norm loss?
Thanks for your reply!

yuval-alaluf · Answer 1 · Tue Aug 30 2022 23:52:29 GMT+0800 (China Standard Time)

Using the ID loss is a bit strange here since it was trained for real face images and your anime dataset is out of domain for this.
Other than that, it could be that pSp is not able to fully capture all the details here. You could try other more advanced encoders such as ReStyle and Hyperstyle, or try optimization-based approaches like PTI.