Code-With-aashi / Document-Classification

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Document-Classification

Steps used in the project..

Cleaning the data

Make text lowercase, remove text in square brackets,remove links,remove punctuation and remove words containing numbers.

Removing Frequent words..

Removing the most occuring words.

Removing Rare words..

Removing of rare words

Stemming

Stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form.

Lemmatization

Lemmatization is similar to stemming in reducing inflected words to their word stem but differs in the way that it makes sure the root word (also called as lemma) belongs to the language.

Spilitting the dataset

Preparing the Model

About


Languages

Language:Jupyter Notebook 100.0%