help with training

Question

help with training

y-x-c opened this issue 4 years ago · comments

Yuxin Chen commented 4 years ago

Thanks for the awesome code! I am training my own model right now and have a few questions:

currently I am using 100k (out of around 1.8m) images from CelebAMask-HQ, ffhq and vggface to train the model. did you use the full set to train your model?
I didn't see large improvement for most losses anymore (160k steps trained, 4gpus x 12images/batch); is this normal? should I just continue training for more steps?
I also checked the validation results, and the reconstruction is not good.
I noticed shuffle for the training dataloader is not set to True, did you use the same setting?

Thanks!

Changho Choi 최창호 · Answer 1 · Mon Nov 09 2020 14:52:07 GMT+0800 (China Standard Time)

Hi! You did very fast training!

Yes, I used full-set dataset. I don't know about IJB-C dataset. The distribution of dataset can influence to your model.
In the paper, they trained for 500K steps. I trained for over 500K. In my eye, your losses are getting down for attribute_loss but unstable for Rec and ID loss. In my case, the two losses are more stable and lower at the same steps.
shuffle option in training dataloader should be True. It is clearly my mistake while publishing.

Yuxin Chen · Answer 2 · Mon Nov 09 2020 19:17:46 GMT+0800 (China Standard Time)

Hi! You did very fast training!

Yes, I used full-set dataset. I don't know about IJB-C dataset. The distribution of dataset can influence to your model.

In the paper, they trained for 500K steps. I trained for over 500K. In my eye, your losses are getting down for attribute_loss but unstable for Rec and ID loss. In my case, the two losses are more stable and lower at the same steps.

shuffle option in training dataloader should be True. It is clearly my mistake while publishing.

Thanks for your reply.

I just corrected the description, I am using the same datasets (CelebAMask-HQ, ffhq and vggface) as well.

So in your case, each step has 64 images; and let's say there are 1.5m images in those three datasets, so you trained for around 4 epochs (= 64 * 500000 / 1500000 / 5 ) in total?
In my case, each step only has 48 images, so maybe that's why the two losses are higher at the same steps.
I found the Rec loss is going much lower in the third epoch, and the results are much better than before. I will continue my current training and see what's going on.

Thanks for the clarification, I also changed to True during my training.

Changho Choi 최창호 · Answer 3 · Wed Nov 25 2020 14:30:02 GMT+0800 (China Standard Time)

I trained with 32 batch size, it is the same as the paper. (Two V100 32G GPUs, 16 batch size for each)
Training GAN is very unstable. If your loss is going down, I think it works well.

QiulinW · Answer 4 · Fri Nov 27 2020 14:39:58 GMT+0800 (China Standard Time)

Hi! You did very fast training!

Yes, I used full-set dataset. I don't know about IJB-C dataset. The distribution of dataset can influence to your model.

In the paper, they trained for 500K steps. I trained for over 500K. In my eye, your losses are getting down for attribute_loss but unstable for Rec and ID loss. In my case, the two losses are more stable and lower at the same steps.

shuffle option in training dataloader should be True. It is clearly my mistake while publishing.

Thanks for your reply.

I just corrected the description, I am using the same datasets (CelebAMask-HQ, ffhq and vggface) as well.

So in your case, each step has 64 images; and let's say there are 1.5m images in those three datasets, so you trained for around 4 epochs (= 64 * 500000 / 1500000 / 5 ) in total?

In my case, each step only has 48 images, so maybe that's why the two losses are higher at the same steps.

I found the Rec loss is going much lower in the third epoch, and the results are much better than before. I will continue my current training and see what's going on.

Thanks for the clarification, I also changed to True during my training.

Hi, did you change the coefficients of different loss terms? I found my training unstable with the coeffs provided by the author...

Hanieh Khosravi · Answer 5 · Wed Dec 30 2020 21:58:35 GMT+0800 (China Standard Time)

I trained with 32 batch size, it is the same as the paper. (Two V100 32G GPUs, 16 batch size for each)

Training GAN is very unstable. If your loss is going down, I think it works well.

it means that you have used 'dp' instead of 'ddp'?
since in 'ddp' mode the whole batch is not devided between GPUs.

payne4handsome · Answer 6 · Mon Jan 04 2021 16:08:45 GMT+0800 (China Standard Time)

@y-x-c Hi, have you got the satisfying result? I trained just with FFHQ and CelebA-HQ datasets about 90 thousand images. The result is bad just like below.

princessmittens · Answer 7 · Tue Mar 02 2021 02:39:28 GMT+0800 (China Standard Time)

By 4 epoch's you mean 26...

I am about two weeks into training at about the halfway mark. I noticed that some of the results on the colab show some image artifacts. Is that present in all final results? Did you manage to fix that with more training?

Hanieh Khosravi · Answer 8 · Tue Mar 02 2021 12:42:59 GMT+0800 (China Standard Time)

By 4 epoch's you mean 26...

I am about two weeks into training at about the halfway mark. I noticed that some of the results on the colab show some image artifacts. Is that present in all final results? Did you manage to fix that with more training?

@princessmittens I am also working on this paper and I have some questions about this implementation. May I have your email address?

princessmittens · Answer 9 · Wed Mar 10 2021 01:41:51 GMT+0800 (China Standard Time)

@usingcolor I am near the end of training and these are my results. I have trained on 8 gpu's with 32 gig ram and a 21 batch size/ per gpu. The results have been pretty bad so far. I tried my best to recreate the exact parameters with all 3 datasets (~1.3 million images after processing) and have trained for about 2-3 weeks. With my current batch size and according to the results, I'm at the 79% mark in reference to 500k.

@y-x-c -Have you been able to recreate better results? Is it worth continuing?

This has cost a lot of money/time Any input would be great.

@hanikh Not sure how much I can help you considering my results but my email is <>

Src:

Target:

Results:

Tam Nguyen · Answer 10 · Tue Mar 16 2021 16:00:30 GMT+0800 (China Standard Time)

Hello, anyone here got good result?

princessmittens · Answer 11 · Tue Mar 16 2021 21:15:40 GMT+0800 (China Standard Time)

No-I have talked to @hanikh. I don't think anyone has been able to recreate the results as of yet.

Antonio Gonzalez · Answer 12 · Sun Mar 28 2021 09:16:10 GMT+0800 (China Standard Time)

I have better results than this, princessmittens, can you leave your e-mail and I will reach out to you?

No-I have talked to @hanikh. I don't think anyone has been able to recreate the results as of yet.

Tam Nguyen · Answer 13 · Mon Mar 29 2021 10:26:49 GMT+0800 (China Standard Time)

@cwalt2014 Can you please share the source code or pretrained weights? I will appreciate that. My email: tamvannguyen200795@gmail.com
Thanks.

Delia Jiamin · Answer 14 · Fri Apr 09 2021 18:54:41 GMT+0800 (China Standard Time)

@cwalt2014 Dear friend！Can you please share the source code? Thanks a million🙏🙏. My email: zhangjiajia827@gmail.com
Thanks！🙏🙏🙏

princessmittens · Answer 15 · Mon Apr 12 2021 21:11:58 GMT+0800 (China Standard Time)

@cwalt2014 my email is andreachristians@gmail.com

Lucifer_Toledo · Answer 16 · Tue May 11 2021 16:13:42 GMT+0800 (China Standard Time)

@cwalt2014 my email is dzl0418@gmail.com

Poloangelo · Answer 17 · Thu May 13 2021 15:12:30 GMT+0800 (China Standard Time)

@cwalt2014 I would also love to know what changes you would suggest to get better results 🙏 my mail is paulchvn@gmail.com

Seanseattle · Answer 18 · Wed May 19 2021 10:28:30 GMT+0800 (China Standard Time)

@cwalt2014 Could you please share the source code or pretrained weights? Thank you. My Email: seanzlxu@gmail.com

chinasilva · Answer 19 · Fri Jul 02 2021 10:51:42 GMT+0800 (China Standard Time)

@cwalt2014 Thank you very much. My Email:476369545@qq.com

lefsiva · Answer 20 · Thu Jul 15 2021 15:05:33 GMT+0800 (China Standard Time)

@cwalt2014 Thank you. My email is: lefsiva7@gmail.com

liuyijiao · Answer 21 · Tue Sep 28 2021 10:57:02 GMT+0800 (China Standard Time)

@cwalt2014 Thank you very much. My email is: 302926535@qq.com

tryink · Answer 22 · Sun Oct 17 2021 16:04:12 GMT+0800 (China Standard Time)

I have better results than this, princessmittens, can you leave your e-mail and I will reach out to you?

No-I have talked to @hanikh. I don't think anyone has been able to recreate the results as of yet.

Hi, @cwalt2014, could you please send me some of your results? I wonder what the possible results looks like. My email is 1085425753@qq.com. Any reply will be appreciated.

suzie26 · Answer 23 · Tue Nov 23 2021 22:42:37 GMT+0800 (China Standard Time)

@cwalt2014 Can you share your code or pretrained weights?? Thank you soooo much!!
My email is: suzieya26@gmail.com

Daichi Zhang · Answer 24 · Thu Dec 02 2021 16:10:53 GMT+0800 (China Standard Time)

@cwalt2014 Could you share your code or pretrained weights?? Thank you very much!! My email is: daisy.zdcc@gmail.com

Changho Choi 최창호 · Answer 25 · Fri Dec 03 2021 15:48:51 GMT+0800 (China Standard Time)

@Daisy-Zhang @suzie26 @tyrink @akafen @lefsiva @chinasilva @Seanseattle @Poloangelo @ZhiluDing @princessmittens @DeliaJIAMIN @tamnguyenvan @cwalt2014 Check out HifiFace, our implementation of a more recent face-swapping model with the pre-trained model.

chuer-yu · Answer 26 · Tue Mar 08 2022 17:08:13 GMT+0800 (China Standard Time)

@cwalt2014 hello, could you please share your pretrained weights? Thank you so much! My email is: chuer.yu1995@gmail.com

ywon0925 · Answer 27 · Sun Mar 13 2022 18:45:42 GMT+0800 (China Standard Time)

@cwalt2014 could you please share your pre-trained weights? I would really appreciate it! My email is: 9788667@gmail.com

niuyuanc · Answer 28 · Sat Nov 19 2022 12:16:53 GMT+0800 (China Standard Time)

@antonsanchez Could you please share the pretrained weights? Thank you so much 🙏🙏🙏
My email: niuyuanc@163.com
Thanks🙏🙏🙏

galmizush · Answer 29 · Fri Jun 09 2023 10:11:06 GMT+0800 (China Standard Time)

@cwalt2014 could you please share your pre-trained weights? I would really appreciate it! My email is: galmizush@gmail.com

JaeHyun Park · Answer 30 · Thu Jul 06 2023 20:33:57 GMT+0800 (China Standard Time)

@cwalt2014 could you please share your pre-trained weights? My email is: jaep0805@snu.ac.kr
Thank you so much