zomux / lanmt

LaNMT: Latent-variable Non-autoregressive Neural Machine Translation with Deterministic Inference

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Using existing NLP pre-trained encoders like BERT, RoBERTa

BogdanDidenko opened this issue · comments

What do you think about combining your architecture with existing pre-trained encoders? Can BERT as an prior_encoder help achieve the better results?

@BogdanDidenko Improving the prior is a promising approach. Here is a figure shows that the BLEU score goes up monotonically when improving the quality of the prior. (It shows the interpolation between p(z|x) and q(z|x,y) )

interpolation

I'm not sure whether BERT is able to do the job, but it is a promising thing to investigate. If it works in autoregressive models, it shall also work in non-autoregressive models somehow.

Yes, it's interesting research area. In my experience with BERT and autoregressive transformer decoder I achieve ~10% quality improvement in my seq2seq task(with RoBERTa this result even better). But I use some tricks and hard to say how it's will work with proposed approach.