tasdikrahman / movieReviewsAnalysis

Some stupid Movie reviews analyzed and classified using nltk and scikitlearn

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Movie Review Analysis

An analysis of the movie_review data set included in the nltk corpus. I would probably add some buzz words here later on.


Index:


What is in this repo

[Back to top]

  • An implementation of nltk.NaiveBayesClassifier trained against 5000 movie reviews. Implemented in nltkNB.ipynb
  • Using sklearn
    • Naive Bayes:
      • MultinomialNB:
      • BernoulliNB:
    • Linear Model
      • LogisticRegression:
      • SGDClassifier:
    • SVM
      • SVC:
      • LinearSVC:
      • NuSVC:

Implemented in scikitlearnNB.ipynb

  • Implemented a voting system to choose the best out of all the learning methods. Implemented in voting_process.ipynb

Accuracy achieved

[Back to top]

Classifiers Accuracy achieved
nltk.NaiveBayesClassifier 73.0%
ScikitLearn Implementations
BernoulliNB 72.0%
MultinomialNB 76.0%
LogisticRegression 74.0%
SGDClassifier 69.0%
SVC 48.0%
LinearSVC 74.0%
NuSVC 74.0%

Requirements

[Back to top]

The simplest way(and the suggested way) would be to install the required packages and the dependencies by using either anaconda or miniconda

After that you can do

$ conda update conda
$ conda install scikit-learn nltk

Downloading the dataset

[Back to top]

The dataset used in this package is bundled along with the nltk package.

Run your python interpreter

>>> import nltk
>>> nltk.download('stopwords')
>>> nltk.download('movie_reviews') 

NOTE: You can check system specific installation instructions from the official nltk website

Check if everything is good till now by running your interpreter again and importing these

>>> import nltk
>>> from nltk.corpus import stopwords, movie_reviews
>>> import sklearn
>>> 

If these imports work for you. Then you are good to go!


Running it

[Back to top]

  1. Clone the repo
$ git clone https://github.com/prodicus/movieReviewsAnalysis
$ cd movieReviewsAnalysis
## run the ipython server
$ ipython notebook
  1. Order of running

  2. nltkNB.ipynb

  3. scikitlearnNB.ipynb

  4. voting_process.ipynb

  5. Hack away!


So

[Back to top]

"So what, Well this is pretty basic!"

Yes, it is but hey we all do start somewhere right?

Psst. I am working on a spam filtering system. You know the one in which you paste an email and then it tells you whether it is a spam or not.

You can follow me on twitter @tasdikrahman to keep tabs on it.


Legal stuff

[Back to top]

Hacked together by Tasdik Rahman under the MIT License

You can find a copy of the License at http://prodicus.mit-license.org/

About

Some stupid Movie reviews analyzed and classified using nltk and scikitlearn

License:MIT License


Languages

Language:Jupyter Notebook 100.0%