In this Project, we want to practice some methods of solving sequential problems on the two problems of Part of Speech Tagging and Named Entity Recognition and examine the differences and challenges of each of them.
The dataset of this project is from the Penn Treebank (PTB)
dataset, which is accessible by the nltk
library.
* For running the code, in case you want to read the emission matrix for Viterbi algorithm make sure you have “emission_ptb.csv” and “NER_emission_15.csv” files.
- Importing
Penn Treebank
dataset usingnltk
- Explanation of Markov chain & Hidden Markov Model (HMM)
- Explanation of Transition Matrix & Emission Probabilities
- Implementing Viterbi algorithm
- POS tagging using Viterbi algorithm
- POS tagging using
RNN
- POS tagging using
LSTM
- POS tagging using
GRU
- Explaining
LSTM
&GRU
gates - Discussing
LSTM
vsGRU
- Comparing & analyzing results
- Importing
Penn Treebank
dataset usingnltk
- Explanation of
IOB tag
- NER using adjusted
Viterbi algorithm
Report is available here.