Using existing NLP pre-trained encoders like BERT, RoBERTa

Question

Using existing NLP pre-trained encoders like BERT, RoBERTa

BogdanDidenko opened this issue 5 years ago · comments

What do you think about combining your architecture with existing pre-trained encoders? Can BERT as an prior_encoder help achieve the better results?

Raphael Shu · Answer 1 · Wed Nov 27 2019 08:25:11 GMT+0800 (China Standard Time)

@BogdanDidenko Improving the prior is a promising approach. Here is a figure shows that the BLEU score goes up monotonically when improving the quality of the prior. (It shows the interpolation between p(z|x) and q(z|x,y) )

I'm not sure whether BERT is able to do the job, but it is a promising thing to investigate. If it works in autoregressive models, it shall also work in non-autoregressive models somehow.

Bogdan Didenko · Answer 2 · Wed Nov 27 2019 19:12:39 GMT+0800 (China Standard Time)

Yes, it's interesting research area. In my experience with BERT and autoregressive transformer decoder I achieve ~10% quality improvement in my seq2seq task(with RoBERTa this result even better). But I use some tricks and hard to say how it's will work with proposed approach.