jessgschueler / Spark-CR

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Spark CR

by Jess Schueler

A demonstration of Pyspark using Spotify Data.

Technologies Used

  • Python
  • Pyspark
  • SQL

Description

Using PySpark to profile, clean, and briefly analyze Spofity artists data.

Setup/Installation Requirements

  • In the terminal, clone github repository using the following command;
    $ git clone https://github.com/jessgschueler/Spark-CR
    
  • In a venv, Pip install requirements.txt file
  • Create a /data directory and run get_data.sh inside it
  • Run main.py file

Known Bugs

  • None at this time

License

MIT

Copyright (c) 6/24/22 Jess Schueler

About


Languages

Language:Python 96.3%Language:Shell 3.7%