big-data customer-churn machine-learning spark

Sparkify: Customer Churn Prediction of a Music Streaming Service Using Spark

This project analyzes and predicts customer churn of a music streaming service using Spark on a large dataset.

I took the starter code for this repository from a Udacity assignment project and modified it to the present form, which deviates significantly from the original form; see starter.

The project focuses on an imaginary music streaming service, similar to Spotify, where users can listen to streamed music. In that service:

We have: (1) free-tier users and (2) premium users who pay a subscription.
Every time an user is involved in an event, it is logged with a timestamp; example events: songplay, logout, like, ad_heard, downgrade, etc.

The goal is to predict customer churn, either (1) as a downgrade from the premium to free plan or (2) in form of a user leaving the service. With churn predictions, the company can target those users with incentives, such as discounts, etc.

🚧 On-going work.

Contents:

A
B
...

Dataset

🚧 TBD.

How to Use This Project

🚧 TBD.

The directory of the project consists of the following files:

.
├── Instructions.md
...

Installing Dependencies for Custom Environments

If you already have a Python environment with the usual ML libraries and you'd like to add PySpark:

# Install PySpark manually
python -m pip install pyspark
python -m pip install findspark

Alternatively, if you want to create a new Python environment (recommended), you can do it with conda:

# Create an environment
conda create -n sparkify python=3.9 pip
conda activate sparkify

# Install pip-tools
python -m pip install -U pip-tools

# Generate pinned requirements.txt
# PySpark is listed there
pip-compile requirements.in

# Install pinned requirements, as always
python -m pip install -r requirements.txt

# If required, add new dependencies to requirements.in and sync
# i.e., update environment
pip-compile requirements.in
pip-sync requirements.txt
python -m pip install -r requirements.txt

# To track any changes and versions you have
conda env export > conda.yaml
pip list --format=freeze > requirements.txt

# To delete the conda environment, if required
conda remove --name sparkify --all

Notes on the Theory

🚧 TBD.

Notes on the Implemented Analysis and Modeling

🚧 TBD.

Results and Conclusions

🚧 TBD.

Next Steps, Improvements

🚧 TBD.

References and Links

🚧 TBD.

Authorship

Mikel Sagardia, 2023.
No guarantees.

If you find this repository useful, you're free to use it, but please link back to the original source.

About

This project analyzes and predicts customer churn of a music streaming service using Spark on a large dataset.

big-data customer-churn machine-learning spark

Languages

Language:Jupyter Notebook 100.0%

mxagar / sparkify_customer_churn

Sparkify: Customer Churn Prediction of a Music Streaming Service Using Spark

Table of Contents

Dataset

How to Use This Project

Installing Dependencies for Custom Environments

Notes on the Theory

Notes on the Implemented Analysis and Modeling

Results and Conclusions

Next Steps, Improvements

References and Links

Authorship

About

Languages