dehaoterryzhang / Yelp_Sentiment_Analysis

Sentiment analysis on the Yelp dataset using Logistic Regression

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sentiment Analysis On Yelp Dataset

Medium Post

Check out my Medium post "Sentiment Classification with Logistic Regression — Analyzing Yelp Reviews" here.

Kernel

Check out my Kaggle kernel here.

Table of Content

Overview

I built a sentiment classification model using logistic regression and tried out different strategies to improve upon the simple model. Among those ideas, including bigrams as features has the most improvement in F1 score. For both the simple model and the improved model, I also analyzed its most important textual features.

Motivation

Sentiment analysis is a highly effective tool for a business to not only take a look at the overall brand perception, but also evaluate customer attitudes and emotions towards a specific product line or service. This data-driven approach can help the business better understand the customers and detect subtle shifts in their opinions in order to meet changing demand.

Procedure

  • Peek at the Review Data
  • Convert Stars into Categories
  • Decide on Evaluation Metric
  • Text Processing & Vectorization
  • Model Development and Evaluation
  • Visualize Feature Importance
  • Analyze Improvement Strategies

Installation

I did my analysis through Kaggle kernel and I recommended you to do so as well, mostly based on two reasons:

  1. The size of Yelp dataset is quite large but it is pre-loaded through Kaggle kernel so you don't need to download it locally.
  2. Most libraries are already available in this environment so no need to install more libraries locally.

About

Sentiment analysis on the Yelp dataset using Logistic Regression


Languages

Language:Jupyter Notebook 100.0%