burhanharoon / N-Gram-Language-Model

It's a python based n-gram langauage model which calculates bigrams, probability and smooth probability (laplace) of a sentence using bi-gram and perplexity of the model.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

N-Gram Model Description

The Corpus for this task should be prepared by yourself. The corpus should consist of 10 different domains and each domain should have 50 distinct files. You are supposed to implement following Python functions.

The text files are not tokenized. You need to implement a function with name tokenize () that takes the file path as its argument and returns the tokenized sentences.

Write a function Ngram () that should accept two required argument, n the order of the n-gram model & sentences and returns the n-grams.

Write a function SentenceProb () that should accept a sentence and returns the probability of the given sentence using Bigram model.

Write a function SmoothSentenceProb () that should accept a sentence and returns the probability of the given sentence using Bigram model and with Laplace smoothing.

Write a method Perplexity (), that calculates the perplexity score for a given sequence of sentences

About

It's a python based n-gram langauage model which calculates bigrams, probability and smooth probability (laplace) of a sentence using bi-gram and perplexity of the model.


Languages

Language:Python 100.0%