urvish7 / Amazon_Vine_Analysis

We’ve been tasked to analyzing Amazon reviews written by members of the paid Amazon Vine program. The Amazon Vine program is a service that allows manufacturers and publishers to receive reviews for their products. Companies like Sell by pay a small fee to Amazon and provide products to Amazon Vine members, who are then required to publish a review. In this project, we’ll have access to approximately 50 datasets. Each one contains reviews of a specific product, from clothing apparel to wireless products. We’ll need to pick one of these datasets and use PySpark to perform the ETL process to extract the dataset, transform the data, connect to an AWS RDS instance, and load the transformed data into pgAdmin. Next, we’ll use PySpark, Pandas, or SQL to determine if there is any bias toward favorable reviews from Vine members in your dataset. Then, we’ll write a summary of the analysis for Jennifer to submit to the SellBy stakeholders.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Amazon_Vine_Analysis

Purpose and Overview of the analysis of the Vine program:

Since our work with Jennifer on the SellBy project was so successful, we’ve been tasked with another, larger project: analyzing Amazon reviews written by members of the paid Amazon Vine program. The Amazon Vine program is a service that allows manufacturers and publishers to receive reviews for their products. Companies like SellBy pay a small fee to Amazon and provide products to Amazon Vine members, who are then required to publish a review.

In this project, we’ll have access to approximately 50 datasets. Each one contains reviews of a specific product, from clothing apparel to wireless products. We’ll need to pick one of these datasets and use PySpark to perform the ETL process to extract the dataset, transform the data, connect to an AWS RDS instance, and load the transformed data into pgAdmin. Next, we’ll use PySpark, Pandas, or SQL to determine if there is any bias toward favorable reviews from Vine members in your dataset. Then, we’ll write a summary of the analysis for Jennifer to submit to the SellBy stakeholders.\

Deliverable 1: Perform ETL on Amazon Product Reviews:

Using your knowledge of the cloud ETL process, we’ll create an AWS RDS database with tables in pgAdmin.

The file which is used for this analysis is : Amazon_Reviews_ETL.ipynb

Extracting the Amazon review data set into the data frames:

The final customer table data frame:

The product data frame and dropping the duplicates.

The review id data frame :

The Vine table data frame:

The four frames that are imported in the database:

The review_id table:

The Product table in PGAdmin:

The customer's table in PG admin:

The vine table in PG admin:

Deliverable 2: Determine Bias of Vine Reviews:

Using our knowledge of PySpark, Pandas, or SQL, we’ll determine if there is any bias towards reviews that were written as part of the Vine program. For this analysis, we'll determine if having a paid Vine review makes a difference in the percentage of 5-star reviews.

The file which is used in this analysis: Vine_Review_Analysis.ipynb

The Vine data frame for the analysis:

creating the vine dataframe:

The total vote data frame :

The data frame of the helpfulvotes equal or greater than 50%:

The paid vine dataframe:

The unpaid vine frame:

Deliverable 3

Results:

The total calculations of the paid reviews:

  • As per the analysis the total paid review count is 94.
  • The total paid revieew which are 5 stars are 48.
  • The paid 5 star review percentage is 51.06%

The total calculation of the unpaid reviews:

  • The total unpaid fivestar review count is 404471.

  • however the unpaid 5 stars review count is 15663.

  • When we need to calculate in percentage its 38.7%

  • How many Vine reviews and non-Vine reviews were there?

    Vine 94

    Non - Vine 40471:

  • How many Vine reviews were 5 stars? How many non-vine reviews were 5 stars?

    Vine- 48

    Non - Vine 15663

  • What percentage of Vine reviews were 5 stars ? What percentage of non -vine reviews were 5 stars?

    Vine 51.06%

    Non Vine 38.7 %

We can infer from the findings that there was no bias in the vine members' reviews for the video game dataset. Only 51 percent of our paid members have given us a 5-star rating, whereas the percentage of non-paying members who have done so is substantially higher at 38.7 percent. The disparity is too small.

The data frame of the 5 stars reviews:

About

We’ve been tasked to analyzing Amazon reviews written by members of the paid Amazon Vine program. The Amazon Vine program is a service that allows manufacturers and publishers to receive reviews for their products. Companies like Sell by pay a small fee to Amazon and provide products to Amazon Vine members, who are then required to publish a review. In this project, we’ll have access to approximately 50 datasets. Each one contains reviews of a specific product, from clothing apparel to wireless products. We’ll need to pick one of these datasets and use PySpark to perform the ETL process to extract the dataset, transform the data, connect to an AWS RDS instance, and load the transformed data into pgAdmin. Next, we’ll use PySpark, Pandas, or SQL to determine if there is any bias toward favorable reviews from Vine members in your dataset. Then, we’ll write a summary of the analysis for Jennifer to submit to the SellBy stakeholders.


Languages

Language:Jupyter Notebook 100.0%