kmkurn / pytorch-crf

(Linear-chain) Conditional random field in PyTorch.

Home Page:https://pytorch-crf.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

error increase, f1 decrease

Fei-Wang opened this issue · comments

ner task.

  1. with Bert + softmax+cross-entropy, train loss decrease, f1 increase, and valid data act as train data.
  2. with Bert + crf, train loss first decrease and then increase, f1 first increase then decrease. but valid act as normal.

I find it is because I use dropout between Bert and linear layer. when I set model.eval, it is act correct, but if I set model.train, the metric is low. But as I know, dropout is used for avoid overfit, it should't act so large different between model.eval and model.train.

Train loss should always decrease if you set the learning rate small enough. Have you tried smaller learning rate? Also, can you check if you can overfit a small portion of the train set?

Thank you for your reply, I will figure it out.