brightmart / bert_language_understanding

Pre-training of Deep Bidirectional Transformers for Language Understanding: pre-train TextCNN

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

different result when set lr to 0.001

liu-nlper opened this issue · comments

commented

I modify the learning rate to 0.001 and keep the other settings as default, then test on the same dataset without pretraining, my experiment results are quite different yours, pre-training can accelerate the convergence speed, however it may lead to a worse performance.

Epochl valid loss(my) valid F1(my) valid F1(this)
1 4.606 57.9 58.0
5 2.234 71.3 74.0
7 1.774 73.0 75.0
15 1.449 75.3 -
35 - - 75.0

what's the performance on fine-tuning? can you also add fine-tuning performance to make a comparision.

commented

I haven't compare with fine-tuning yet, whitch corpus do you use to train the MLM?

same as training data. no need to change any code

@liu-nlper do you get result to compare?