metabrainz / listenbrainz-labs

A collection tools/scripts to explore the ListenBrainz data using Apache Spark.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Note: This repository is archived and merged into listenbrainz-server. Please open all pull requests in the listenbrainz-server codebase.

Things to do in order for them to run correctly:

Set env var:

export PYSPARK_PYTHON=which python3

Install required modules:

pip3 install -r requirements.txt

Install java and scala:

apt-get install default-jdk scala

Install spark (download 2.3.0 tgz for hadoop and unzip in /usr/local/spark

To run the scripts:

spark-submit --master spark://195.201.112.36:7077 --executor-memory=29g pwd/<script>

spark-submit --master spark://195.201.112.36:7077 --executor-memory=29g pwd/train_models.py df models

About

A collection tools/scripts to explore the ListenBrainz data using Apache Spark.

License:GNU General Public License v2.0


Languages

Language:Python 78.1%Language:HTML 11.0%Language:Shell 6.9%Language:Dockerfile 4.1%