Dropout / unsupervised training implementation ?

Question

Dropout / unsupervised training implementation ?

Natooz opened this issue a year ago · comments

Hello, 👋

First thank you for the very good quality of your code and the support you gave to issues / PR.
For a research projet, I am fine-tuning models on a contrastive objective. I intend to use your unsupervised method.

Looking at the code of SimCSE, I noticed that in the forward pass all sequences $x$ and $x^+$ are passed through the same batch. But doing so will apply the same dropout mask to all of them.
My workaround is to perform two forward passes through the BERT encoder, but I still wondered, did you do the same or maybe am I missing something ?