ryanschaub / Sentiment-Analysis-on-IMDB-Film-Reviews

Sentiment Analysis is a popular Natural Language Processing (NLP) task which allows us to extract the overall opinion in a text. In this project, we will be performing Sentiment Analysis on some IMDB movie reviews, to classify the overall review as positive or negative. When dealing with text data, a prevalent issue is how to encode the words as a numeric feature that can be used to compute the output of a classification algorithm. Especially because words don’t naturally lend themselves to a numeric ordering, there have been many approaches on how to featurize a text. In this project, we will use the bag of words model, which uses the count of words in a text as a feature. We will begin by using logistic regression to perform this task, followed by a decision tree approach, and random forests models. We will tune the regularize and tune the parameters of each model and use AdaBoost Classifiers with our Decision Tree and Random Forest models as our base estimator. Finally, we will compare the performance of each model on our training and validations data sets.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ryanschaub/Sentiment-Analysis-on-IMDB-Film-Reviews Stargazers