tenniskit / capstone-project-sparkify

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Capstone Project - User churn on Sparkify

Table of Contents

  1. Introduction
  2. What's included
  3. File Descriptions
  4. Installation
  5. Licensing, Authors, and Acknowledgements

Introduction

Given that I were Data Scientist working for an online music streaming company called "Sparkify". Users' subscriptions on Sparkify are either Free-tier or Subscription. Both of types of users can cancel the subscription anytime. Cancelling subscription is called 'churn'. The project aims at building a model with Logistics Regression algorithm, trying to predict which users are likely to churn based on their behaviour on Sparkify.

For the detail documentation, please refer to a blog post on Medium: Blog post

What's included

(project folder)/
├── mini_sparkify_event_data.zip
├── Sparkify.html
├── Sparkify.ipynb

File Descriptions

  1. mini_sparkify_event_data.zip
    • Dataset containing users' behaviours on Sparkify. Since the file is zipped, the python program will automatically unzip it before reading.
  2. Sparkify.html
    • The Jupyter Notebook file in HTML format.
  3. Sparkify.ipynb
    • The Jupyter Notebook file containing source-code of data science process of predicting churn of users.

Installation

The code should run with no issues using Python versions 3.6.3 with Spark '2.4.3'.

Python libraries used in the project:

  1. pandas
  2. numpy
  3. pyspark
  4. json
  5. datetime
  6. matplotlib
  7. sys
  8. zipfile

Licensing, Authors, Acknowledgements

Code released under the MIT License. Must give credit to Udacity for the data.

About

License:MIT License


Languages

Language:HTML 76.4%Language:Jupyter Notebook 23.6%