parnianf / POS-Tagging-And-NER-Using-RNN-LSTM-GRU-Viterbi-algorithm

Part of Speech Tagging & Named Entity Recognition, CA3, Natural Language Processing Course (Spring 2022), University of Tehran

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

POS-Tagging-And-NER-Using-RNN-LSTM-GRU-Viterbi-algorithm

In this Project, we want to practice some methods of solving sequential problems on the two problems of Part of Speech Tagging and Named Entity Recognition and examine the differences and challenges of each of them.

Dataset

The dataset of this project is from the Penn Treebank (PTB) dataset, which is accessible by the nltk library.

* For running the code, in case you want to read the emission matrix for Viterbi algorithm make sure you have “emission_ptb.csv” and “NER_emission_15.csv” files.

Part 1: Part of Speech Tagging (POS)

  • Importing Penn Treebank dataset using nltk
  • Explanation of Markov chain & Hidden Markov Model (HMM)
  • Explanation of Transition Matrix & Emission Probabilities
  • Implementing Viterbi algorithm
  • POS tagging using Viterbi algorithm
  • POS tagging using RNN
  • POS tagging using LSTM
  • POS tagging using GRU
  • Explaining LSTM & GRU gates
  • Discussing LSTM vs GRU
  • Comparing & analyzing results

Part 2: Named entity Recognition (NER)

  • Importing Penn Treebank dataset using nltk
  • Explanation of IOB tag
  • NER using adjusted Viterbi algorithm

Report

Report is available here.

About

Part of Speech Tagging & Named Entity Recognition, CA3, Natural Language Processing Course (Spring 2022), University of Tehran


Languages

Language:HTML 95.3%Language:Jupyter Notebook 4.7%