SHIJINGLI0206 / NoisyNER---named-entity-recognition-in-social-media

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NoisyNER - named entity recognition in social media

Repository of the bachelor thesis NoisyNER - named entity recognition in social media.

Requirements

Python 3.6.1
Also install content in requirements.txt

Usage

To preprocess the data simply run createData.py and in case of the pytorch classifier also buildVocab.py
training : python train.py
For training the pytrch classifier with pre-trained character embedding put the gloVe embedding file (glove.twitter.27B.200d.txt) found at https://nlp.stanford.edu/projects/glove/ in the following folder: NoisyNER/project/pytorch/Data/embed/glove.twitter.27B

evaluation: python predict.py --eval
prediction: python predict.py "input sentence"

Final Results on the wnut dataset

Classifier F1 score
Unigram 5.10
Bigram - backoff 5.29
Trigram - backoff 5.29
Decision Tree 8.15
Bernoulli NB 17.61
Multinomial NB 13.91
SVM 12.88
GradientBoosting 10.79
LogisticRegression 10.79
BI-LSTM CRF 13.18

About


Languages

Language:Python 98.2%Language:HTML 1.3%Language:CSS 0.3%Language:JavaScript 0.2%