Savadogo / Spark-Capstone-Project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Udacity Nanodegree big data Capstone Project--Sparkify

Table of Contents

  1. Installation
  2. Project Motivation
  3. File Descriptions
  4. Results
  5. Licensing, Authors, and Acknowledgements

Installation

You will need Pyspark SQL and Pyspark ML. The code should run with no issues using Python versions 3.*.

Project Motivation

the goad is to predict churns based on user log data(a tiny subset (128MB) of the full dataset available (12GB)) from a music app.

File Descriptions

The following are the files available in this repository:

  • Sparkify Project.ipynb - a notebook of Exploratory Data Analysis,Feature Engineering and Modeling to predict churns, and which is exported into Sparkify Project.html.

Results

The main findings of the code can be found at the blog post available here.

Licensing, Authors, Acknowledgements

Must give credit to the data from udacity,and thanks for all the instructions from udacity teams.

About


Languages

Language:HTML 77.8%Language:Jupyter Notebook 22.2%