The reason why pretrain does work.

Question

The reason why pretrain does work.

guotong1988 opened this issue 6 years ago · comments

go hard or go home commented 6 years ago

Do you think that it is mainly because:
BERT is Bidirectional, and CNN could also have the same function?
@brightmart Thank you!

brightmart · Answer 1 · Thu Nov 08 2018 15:33:57 GMT+0800 (China Standard Time)

no. it is not related to BERT or CNN.
a model, no matter which actually it is, pre-train stage is able to learn most of parameters for that model during a pre-train task, which is a kind of supervised. so during fine-tuning, you only need to learn few parameters, such as parameter for last layer as classifier.