This project analyzes and predicts customer churn of a music streaming service using Spark on a large dataset.
I took the starter code for this repository from a Udacity assignment project and modified it to the present form, which deviates significantly from the original form; see starter
.
The project focuses on an imaginary music streaming service, similar to Spotify, where users can listen to streamed music. In that service:
- We have: (1) free-tier users and (2) premium users who pay a subscription.
- Every time an user is involved in an event, it is logged with a timestamp; example events:
songplay
,logout
,like
,ad_heard
,downgrade
, etc.
The goal is to predict customer churn, either (1) as a downgrade from the premium to free plan or (2) in form of a user leaving the service. With churn predictions, the company can target those users with incentives, such as discounts, etc.
π§ On-going work.
Contents:
- A
- B
- ...
π§ TBD.
π§ TBD.
The directory of the project consists of the following files:
.
βββ Instructions.md
...
If you already have a Python environment with the usual ML libraries and you'd like to add PySpark:
# Install PySpark manually
python -m pip install pyspark
python -m pip install findspark
Alternatively, if you want to create a new Python environment (recommended), you can do it with conda:
# Create an environment
conda create -n sparkify python=3.9 pip
conda activate sparkify
# Install pip-tools
python -m pip install -U pip-tools
# Generate pinned requirements.txt
# PySpark is listed there
pip-compile requirements.in
# Install pinned requirements, as always
python -m pip install -r requirements.txt
# If required, add new dependencies to requirements.in and sync
# i.e., update environment
pip-compile requirements.in
pip-sync requirements.txt
python -m pip install -r requirements.txt
# To track any changes and versions you have
conda env export > conda.yaml
pip list --format=freeze > requirements.txt
# To delete the conda environment, if required
conda remove --name sparkify --all
π§ TBD.
π§ TBD.
π§ TBD.
π§ TBD.
π§ TBD.
Mikel Sagardia, 2023.
No guarantees.
If you find this repository useful, you're free to use it, but please link back to the original source.