FID's applicability for smaller datasets

Question

FID's applicability for smaller datasets

mseitzer opened this issue 7 years ago · comments

Hi,

I have a few questions about FID score:

I have a dataset smaller than 2048 images, but I still want to compute FID score. I understand that 2048 images are required as to get a full rank covariance matrix. Computing the FID on my dataset still gives me sensible values, and no complex numbers or NaNs (or warnings). Can I still trust the FID score computed this way as a measure of visual quality?
In this paper, section 5.1, it is noted:

We observe that FID has rather high bias, but small variance, From this perspective, estimating the full covariance matrix might be unnecessary and counter-productive, and a constrained version might suffice.

What is your take on this? Does this hint to FID being usable with smaller datasets?

Do think a FID computed on lower level features maps of Inception is still meaningful? As the features at lower levels still have spatial extent, spatial pooling would have to be applied first. My thought here is to try to make FID work with smaller datasets.

I also ported your implementation to PyTorch, for people who do not want to have the Tensorflow dependency (see here). I hope that is okay with you.

Thanks!

Enrique Sánchez-Lozano · Answer 1 · Fri Feb 01 2019 00:23:46 GMT+0800 (China Standard Time)

@mseitzer I believe that you can also add a very small diagonal matrix to the covariances to make them full rank. If this value is small to make the difference negligible, the matrices become full rank and the score is reliable. Other than this you can compute the SVD decomposition of C1C2 = US*V' and then reconstruct with only the positive eigenvalues of S. This will give a square root that is real.

Nick Kim · Answer 2 · Tue Feb 12 2019 23:09:15 GMT+0800 (China Standard Time)

@mseitzer this is an interesting question -- have you had any chance to think about this or apply it to any of your results with sample sizes < 2048?

Martin Heusel · Answer 3 · Tue Feb 12 2019 23:49:47 GMT+0800 (China Standard Time)

For small sample sizes another option is to train an autoencoder and use the statistics (mean, covariance) of the coding/bottleneck layer for the FID.

mseitzer · Answer 4 · Wed Feb 13 2019 16:04:46 GMT+0800 (China Standard Time)

@nickkimer I do not work on this anymore unfortunately.

One experiment someone could do is to measure the sensitivity of FID to the dataset size. So compute FID on differently sized subsets of the same dataset, and see how FID varies. Repeating this comparison with different datasets should at least give some level of insight to this question.

R.Guo · Answer 5 · Fri Jul 09 2021 16:29:48 GMT+0800 (China Standard Time)

@mhex Do you have some reference about your suggestions? I have a new and small dataset, therefore, I am planning to write a customized assessments methods.

Rohun · Answer 6 · Sat Oct 16 2021 02:03:13 GMT+0800 (China Standard Time)

Reference to FID on different data set sizes and potential unbiasing - https://openaccess.thecvf.com/content_CVPR_2020/papers/Chong_Effectively_Unbiased_FID_and_Inception_Score_and_Where_to_Find_CVPR_2020_paper.pdf