HobbitLong / CMC

Line 189 in 0f72b18

self.params[0] = out.mean() * self.outputSize

This one-time estimation is problematic, especially if the dictionary is not random noise. Computing Z as a moving average of this would give a more reasonable result.

Hi,

Thanks for your comment! Which specific result are you referring to? Or are you suggesting that an EMA of Z could potentially improve all InsDis, MoCo and CMC with NCE loss?

You reported a low number of MoCo with the NCE loss. This is because your implementation of NCE is problematic and correcting it should gives a more reasonable MoCo w/ NCE number.

@KaimingHe , yeah, probably the current NCE implementation is less suitable for MoCo, and I am happy to rectify it. What is the best momentum multiplier for updating Z you would like to suggest?

0.99 for updating Z works well. In ImageNet-1K, MoCo with NCE is ~2% worse than MoCo with InfoNCE, similar to the case of the memory bank counterpart.

Thanks for your input! I have temporarily removed the NCE numbers in README to avoid any confusion, and will keep them vacant until I get a chance to look into it.

Is it necessary to fix or EMA-update Z? Maybe it is unstable if we always compute Z = out.mean() * self.outputSize every time? Also, I couldn't find any statement about this approximation of Z in the paper, or maybe I missed it. Could you designate a reference point of this?

Later I found the statement in InsDis: "Empirically, we find the approximation derived from initial batches sufficient to work well in practice."

Estimating normalization factor Z