FID score

Question

FID score

guoliangq opened this issue 3 years ago · comments

guoliangqi commented 3 years ago

why FID score are always nan in CIFAR10?I calculate the FID score for two identical datasets that are also NAN.

Yifan Jiang · Answer 1 · Sat Nov 20 2021 03:23:42 GMT+0800 (China Standard Time)

Hi @gl0513 ,

Do you calculate FID by separately launching another python test.py process?

gl0513 · Answer 2 · Sat Nov 20 2021 10:02:33 GMT+0800 (China Standard Time)

No, I calculated FID by cifar_train.py

Yifan Jiang · Answer 3 · Sun Nov 21 2021 10:53:23 GMT+0800 (China Standard Time)

I would suggest to use test.py instead. But actually, train.py should not result nan. Maybe you can try test.py again? Just use the checkpoint path on --load_path

Yangzihao Wang · Answer 4 · Tue Jan 11 2022 23:51:46 GMT+0800 (China Standard Time)

@yueruchen Hi! Any updates on this issue? I have cloned the repo and run the cifar_train.py script with vanilla version. And still get the nan error. May I ask what environment you use for your running? For us we run it on A100 GPU and all the py packages from your requirements.txt.

For your suggestion on using test.py, I think it is necessary to include validation of the model. So that we can track the performance of the model. Would you kindly take a look and see if you could reproduce and solve this issue? Thanks a lot!

Yifan Jiang · Answer 5 · Thu Jan 13 2022 00:14:31 GMT+0800 (China Standard Time)

Hi @yzhwang ,
I'm unable to reproduce this nan from my side so I would encourage you to run another test.py. You still can use it to track the performance during the training process, since train.py will save checkpoint every epoch and test.py will load checkpoint automatically.

Yangzihao Wang · Answer 6 · Thu Jan 13 2022 09:19:09 GMT+0800 (China Standard Time)

Hi @yzhwang , I'm unable to reproduce this nan from my side so I would encourage you to run another test.py. You still can use it to track the performance during the training process, since train.py will save checkpoint every epoch and test.py will load checkpoint automatically.

Thanks Yueru, that is exactly what I'm doing right now.

Zhanjie Zhang · Answer 7 · Thu Mar 10 2022 19:05:51 GMT+0800 (China Standard Time)

Hi @yzhwang , I'm unable to reproduce this nan from my side so I would encourage you to run another test.py. You still can use it to track the performance during the training process, since train.py will save checkpoint every epoch and test.py will load checkpoint automatically.

Have your sloved this proplbeM? I face the same problem when I implement another code writed by pytorch.

Yifan Jiang · Answer 8 · Tue Mar 15 2022 09:27:07 GMT+0800 (China Standard Time)

Hi @Jamie-Cheung ,

Sorry it is not solved, the best way is still run two separate jobs.