brightmart / bert_language_understanding

Pre-training of Deep Bidirectional Transformers for Language Understanding: pre-train TextCNN

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The reason why pretrain does work.

guotong1988 opened this issue · comments

Do you think that it is mainly because:
BERT is Bidirectional, and CNN could also have the same function?
@brightmart Thank you!

no. it is not related to BERT or CNN.
a model, no matter which actually it is, pre-train stage is able to learn most of parameters for that model during a pre-train task, which is a kind of supervised. so during fine-tuning, you only need to learn few parameters, such as parameter for last layer as classifier.