- Introduction
- What's included
- File Descriptions
- Installation
- Licensing, Authors, and Acknowledgements
Given that I were Data Scientist working for an online music streaming company called "Sparkify". Users' subscriptions on Sparkify are either Free-tier or Subscription. Both of types of users can cancel the subscription anytime. Cancelling subscription is called 'churn'. The project aims at building a model with Logistics Regression algorithm, trying to predict which users are likely to churn based on their behaviour on Sparkify.
For the detail documentation, please refer to a blog post on Medium: Blog post
(project folder)/
├── mini_sparkify_event_data.zip
├── Sparkify.html
├── Sparkify.ipynb
- mini_sparkify_event_data.zip
- Dataset containing users' behaviours on Sparkify. Since the file is zipped, the python program will automatically unzip it before reading.
- Sparkify.html
- The Jupyter Notebook file in HTML format.
- Sparkify.ipynb
- The Jupyter Notebook file containing source-code of data science process of predicting churn of users.
The code should run with no issues using Python versions 3.6.3 with Spark '2.4.3'.
Python libraries used in the project:
- pandas
- numpy
- pyspark
- json
- datetime
- matplotlib
- sys
- zipfile
Code released under the MIT License. Must give credit to Udacity for the data.