natural-language-processing ngram-language-model text-generation author-identification

Authorship identification and Text generation

Objective

To build Unigram, Bigram and Trigram language models to predict the probability that a given piece of test text belongs to the work of a particular author. Also to generate a small text similar to the work of a given author.

Software Requirements

Python 3

Natural Language Tool Kit

Download the gutenberg corpus using the command nltk.download()

Usage

Menu.py is the front end of a Text generation and Authorship estimation project. Run this code and enter a number based on the functionality you desire. It calls the corresponding function in the code.py file and gives the result.

Project Description

From the gutenberg corpus available in nltk we take the works of 3 authors namely Bryant, Carroll and Shakespeare. For each work we build and train a Unigram, Bigram and a Trigram model. Given a piece of test text, this code calculates the unigram, bigram and trigram probabilities as to which one of the 3 Authors it belongs to. It can also generate text similar to the work of a given author based on the uni, bi and trigram models built using the corpus of their work.

About

Prediction using a Ngram language model the probability that a given text is the work of a certain author. Also generates a text similar to the work of a given author

natural-language-processing ngram-language-model text-generation author-identification

Languages

Language:Python 100.0%