zixiliuUSC / EE660-course-project-Amazon_sentiment_review_analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Amazon Review Sentiment Analysis

This project is to build a natural language processing system fitting the relation between Amazon review text and the coresponding product rating.

sklearn Version Gensim Version NLTK Version numpy version

Abstract

In this project, I use two different language models as feature extraction methods which are TF-IDF and word2vec. Among them, word2vec model is to learn semantic vectors of words by using an unsupervised machine learning model based on 2-layer perceptron classification machine. TF-IDF model is term frequency counting model and I will use PCA algorithm to reduce feature dimension. In order to deal with class imbalance, I use SMOTE techniques to do re-sampling. In the classification, I fine tune and compare the performance between logistic regression, linear regression, decision tree, Adaboost and Gaussian Naive Bayes. In some of this technique, I also use regularization method to do feature reduction. Finally, the evaluation is mainly use F1-macro score for overall performance comparison and F1 score for comparing the performance in each class. Following picture is model architecture.

GitHub Logo

Installation

Install all the package according to official document. This project is developed in Jupyter-Notebook. You can install jupyter-notebook according to this page.

Usage example

All source codes are placed in w2v_train directory. To see all result in EE_660_Final_Project_F19_zixiliu.pdf, download models from this page and place pickle file accordingly. To train all the model from scratch, run preprocess.ipython and then other files.

Release History

  • 0.0.1
    • move local project to github.

Meta

Zixi Liu – @ZixiLiulinkedin

Distributed under the MIT license. See LICENSE for more information.

https://github.com/zixiliuUSC/EE660-course-project/blob/master/LICENSE.md

About

License:MIT License


Languages

Language:Jupyter Notebook 99.3%Language:Python 0.7%