the pre-trained MLM performance
yyht opened this issue · comments
Hi. it's too few corpus to train on pretrain stage. i think you need millions sentences, at least one million.
it's easy to get raw data for pretrain stage, as long as each line contains a document or sentence(s).
it's also common sense to use lots of corpus to train on word embedding, same apply to pretrain language model.
let me know result after using lots of data for pretrain masked language model.
Hi, I tried your bert_model rather than bert_cnn_model. Bert_model could get about 75% F1 score on language model task. But using the pretrained bert_model to finetune on classification task, it didn't work. F1 score was still about 10% after several epoches. It is something wrong with bert_model?