VITA-Group / TransGAN

[NeurIPS‘2021] "TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up", Yifan Jiang, Shiyu Chang, Zhangyang Wang

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FID score

guoliangq opened this issue · comments

why FID score are always nan in CIFAR10?I calculate the FID score for two identical datasets that are also NAN.
image

image
image

Hi @gl0513 ,

Do you calculate FID by separately launching another python test.py process?

No, I calculated FID by cifar_train.py

I would suggest to use test.py instead. But actually, train.py should not result nan. Maybe you can try test.py again? Just use the checkpoint path on --load_path

@yueruchen Hi! Any updates on this issue? I have cloned the repo and run the cifar_train.py script with vanilla version. And still get the nan error. May I ask what environment you use for your running? For us we run it on A100 GPU and all the py packages from your requirements.txt.

For your suggestion on using test.py, I think it is necessary to include validation of the model. So that we can track the performance of the model. Would you kindly take a look and see if you could reproduce and solve this issue? Thanks a lot!

Hi @yzhwang ,
I'm unable to reproduce this nan from my side so I would encourage you to run another test.py. You still can use it to track the performance during the training process, since train.py will save checkpoint every epoch and test.py will load checkpoint automatically.

Hi @yzhwang , I'm unable to reproduce this nan from my side so I would encourage you to run another test.py. You still can use it to track the performance during the training process, since train.py will save checkpoint every epoch and test.py will load checkpoint automatically.

Thanks Yueru, that is exactly what I'm doing right now.

Hi @yzhwang , I'm unable to reproduce this nan from my side so I would encourage you to run another test.py. You still can use it to track the performance during the training process, since train.py will save checkpoint every epoch and test.py will load checkpoint automatically.

Have your sloved this proplbeM? I face the same problem when I implement another code writed by pytorch.

Hi @Jamie-Cheung ,

Sorry it is not solved, the best way is still run two separate jobs.