POS-Tagging-And-NER-Using-RNN-LSTM-GRU-Viterbi-algorithm

In this Project, we want to practice some methods of solving sequential problems on the two problems of Part of Speech Tagging and Named Entity Recognition and examine the differences and challenges of each of them.

Dataset

The dataset of this project is from the Penn Treebank (PTB) dataset, which is accessible by the nltk library.

* For running the code, in case you want to read the emission matrix for Viterbi algorithm make sure you have “emission_ptb.csv” and “NER_emission_15.csv” files.

Part 1: Part of Speech Tagging (POS)

Importing Penn Treebank dataset using nltk
Explanation of Markov chain & Hidden Markov Model (HMM)
Explanation of Transition Matrix & Emission Probabilities
Implementing Viterbi algorithm
POS tagging using Viterbi algorithm
POS tagging using RNN
POS tagging using LSTM
POS tagging using GRU
Explaining LSTM & GRU gates
Discussing LSTM vs GRU
Comparing & analyzing results

Part 2: Named entity Recognition (NER)

Importing Penn Treebank dataset using nltk
Explanation of IOB tag
NER using adjusted Viterbi algorithm

Report

Report is available here.

About

Part of Speech Tagging & Named Entity Recognition, CA3, Natural Language Processing Course (Spring 2022), University of Tehran

Languages

Language:HTML 95.3%Language:Jupyter Notebook 4.7%