CLS of which layers to use in Condenser? last layer CLS? sum of last four layers CLS?
mahdiabdollahpour opened this issue · comments
Hi
Thanks for the nice repo. After pretraining, Condenser has the same architecture as BERT (condenser heads are removed). Which CLS layers worked best for neural IR? last layer CLS? the sum of the last four layers CLS? ....
We fine-tune the last backbone layer's CLS which is the one passed to the head during pre-training.
Closing for now. Feel free to re-open if you have new questions.