Capstone Project - User churn on Sparkify

Introduction
What's included
File Descriptions
Installation
Licensing, Authors, and Acknowledgements

Introduction

Given that I were Data Scientist working for an online music streaming company called "Sparkify". Users' subscriptions on Sparkify are either Free-tier or Subscription. Both of types of users can cancel the subscription anytime. Cancelling subscription is called 'churn'. The project aims at building a model with Logistics Regression algorithm, trying to predict which users are likely to churn based on their behaviour on Sparkify.

For the detail documentation, please refer to a blog post on Medium: Blog post

What's included

(project folder)/
├── mini_sparkify_event_data.zip
├── Sparkify.html
├── Sparkify.ipynb

File Descriptions

mini_sparkify_event_data.zip
- Dataset containing users' behaviours on Sparkify. Since the file is zipped, the python program will automatically unzip it before reading.
Sparkify.html
- The Jupyter Notebook file in HTML format.
Sparkify.ipynb
- The Jupyter Notebook file containing source-code of data science process of predicting churn of users.

Installation

The code should run with no issues using Python versions 3.6.3 with Spark '2.4.3'.

Python libraries used in the project:

pandas
numpy
pyspark
json
datetime
matplotlib
sys
zipfile

Licensing, Authors, Acknowledgements

Code released under the MIT License. Must give credit to Udacity for the data.

tenniskit / capstone-project-sparkify

Capstone Project - User churn on Sparkify

Table of Contents

Introduction

What's included

File Descriptions

Installation

Licensing, Authors, Acknowledgements

About

Languages