For discriminative loss, is the true NCE batch size the number of masked patches?

Question

For discriminative loss, is the true NCE batch size the number of masked patches?

hillup opened this issue 9 months ago · comments

Willup commented 9 months ago

In this piece of code, it seems that the loss is calculated at the granularity of samples.

Willup · Answer 1 · Wed Oct 18 2023 19:31:26 GMT+0800 (China Standard Time)

So even if you increase the number of gpus, contrastive learning will not see more negative examples.

Yuan Gong · Answer 2 · Sun Dec 17 2023 06:00:01 GMT+0800 (China Standard Time)

For discriminative loss, is the true NCE batch size the number of masked patches?

In line 347 in your screenshot, NCE is accumulated to all batch of samples, but the negative samples are all from the same spectrogram. I.e., say B=12 (you have 12 spectrograms in a batch), each spectrogram has 512 patches and you mask 400 of them. Then the negative samples is always 400-1=399, but NCE won't update until it goes through all 12 spectrograms.

So even if you increase the number of gpus, contrastive learning will not see more negative examples.

The negative samples will always be #masked_patches-1.