v-mipeng / LexiconNER

Lexicon-based Named Entity Recognition

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Are ``train.XXX.txt'' generated by dictionaries?

LorrinWWW opened this issue · comments

I merged datasets of all entity types (i.e. all train.XXX.txt), and I directly trained the vanilla BiLSTM+CRF on the merged one. The overall F1 was exceeding 90.0 (seems unreasonably high, considering it was generated by dictionaries). Did I misunderstand anything? Many thanks!

I use 100 dimensional glove embeddings, 30 dimensional character embeddings (by a LSTM).
The hidden dimension is 200 (i.e. 100 for each direction). The dropout rate is 0.5. The optimizer is SGD, with learning rate of 0.01. The batch size is 32.