pos-tagger

CS B551 Fall 2016, Assignment #3

Your names and user ids: Sarvothaman Madhavan - madhavas Raghavendra Nataraj - natarajr Prateek Srivastava - pratsriv

(Based on skeleton code by D. Crandall)

Training: While training the following probabilities are calculated using the plug-in principle (plug in the counts/total occurrence in place of probabilities:

Initial state probabilities : Out of all the sentences, how many times did each part of speech started the sentence
State transition probabilities : Count the occurrence of each pairs of pos that occurred in training data
Emission probabilities : For each pos, count how many times any word occurred as that part of speech
Complex state transition probabilities : Count the occurrence of each triple pairs of pos that occurred in training data.

Complex Model: For the complex model, we started out with estimating the P(S1/W) and P(S2/W) as special cases since these do not have the same structure as all other probabilities i.e P(S3/W). Once these two are saved as tau1 and tau2, all other probability calculations will lookup previous tau values to estimate current "level" probabilities and further save it as current level of tau

Posterior Calculation: For posterior calculation, we assume the HMM model

Accuracy Table for bc.test

	Word Accuracy	Sentence Accuracy
Simplified	93.96%	47.50%
hmm	95.03%	54.05%
Complex	92.61%	44.45%

About

Part-of-Speech tagger using word count, naive bayes and hmm approach

hmm naive-bayes python

Languages

Language:Python 100.0%