Problem about the stop gradient operation for negative pairs.
MosPicDev opened this issue · comments
According to the equation (5) in paper, which says that "gradients are prevented from flowing through the negatives". Then for the PatchNCELoss function in
, should we detach feat_k after the calculation of pos logits?