Have you tried condenser pretraining on RoBERTa ?

Question

Have you tried condenser pretraining on RoBERTa ?

1024er opened this issue 2 years ago · comments

I pretrained a condeser-roberta-base on the same data and hyperparameters, but the results on downstream tasks were not high.

Have you ever tried condenser pretraining on RoBERTa-base ?

Thank you

Luyu Gao · Answer 1 · Thu May 26 2022 20:57:02 GMT+0800 (China Standard Time)

Same data no. I have trained with openwebtext (a open version of web text, part of Roberta training data) with a base architecture Roberta. It does better on sentence similarity task but not on retrieval tasks, when compared with Bert condenser. As a side note, we observed previously that vanilla Roberta base is typically inferior to vanilla Bert base on retrieval tasks.

We have just started test runs with condenser-roberta-large and therefore not much to say there yet.