Udacity Nanodegree big data Capstone Project--Sparkify

Installation

You will need Pyspark SQL and Pyspark ML. The code should run with no issues using Python versions 3.*.

the goad is to predict churns based on user log data(a tiny subset (128MB) of the full dataset available (12GB)) from a music app.

The following are the files available in this repository:

Sparkify Project.ipynb - a notebook of Exploratory Data Analysis,Feature Engineering and Modeling to predict churns, and which is exported into Sparkify Project.html.

The main findings of the code can be found at the blog post available here.

Must give credit to the data from udacity,and thanks for all the instructions from udacity teams.

Language:HTML 77.8%Language:Jupyter Notebook 22.2%