bioinf-jku / TTUR

Two time-scale update rule for training GANs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FID's applicability for smaller datasets

mseitzer opened this issue · comments

Hi,

I have a few questions about FID score:

  1. I have a dataset smaller than 2048 images, but I still want to compute FID score. I understand that 2048 images are required as to get a full rank covariance matrix. Computing the FID on my dataset still gives me sensible values, and no complex numbers or NaNs (or warnings). Can I still trust the FID score computed this way as a measure of visual quality?

  2. In this paper, section 5.1, it is noted:

We observe that FID has rather high bias, but small variance, From this perspective, estimating the full covariance matrix might be unnecessary and counter-productive, and a constrained version might suffice.

What is your take on this? Does this hint to FID being usable with smaller datasets?

  1. Do think a FID computed on lower level features maps of Inception is still meaningful? As the features at lower levels still have spatial extent, spatial pooling would have to be applied first. My thought here is to try to make FID work with smaller datasets.

I also ported your implementation to PyTorch, for people who do not want to have the Tensorflow dependency (see here). I hope that is okay with you.

Thanks!

@mseitzer I believe that you can also add a very small diagonal matrix to the covariances to make them full rank. If this value is small to make the difference negligible, the matrices become full rank and the score is reliable. Other than this you can compute the SVD decomposition of C1C2 = US*V' and then reconstruct with only the positive eigenvalues of S. This will give a square root that is real.

@mseitzer this is an interesting question -- have you had any chance to think about this or apply it to any of your results with sample sizes < 2048?

For small sample sizes another option is to train an autoencoder and use the statistics (mean, covariance) of the coding/bottleneck layer for the FID.

@nickkimer I do not work on this anymore unfortunately.

One experiment someone could do is to measure the sensitivity of FID to the dataset size. So compute FID on differently sized subsets of the same dataset, and see how FID varies. Repeating this comparison with different datasets should at least give some level of insight to this question.

commented

@mhex Do you have some reference about your suggestions? I have a new and small dataset, therefore, I am planning to write a customized assessments methods.