proto-n / recsys-challenge-2018

Solution for the 2018 Spotify RecSys Challenge

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

recsys-challenge-2018

Solution for the 2018 Spotify RecSys Challenge by the team Definitive Turtles

How to run

  • place the million playlist dataset json files into the data_raw/million_playlist_dataset and the challenge set into the data_raw/challenge_set directories
  • The scripts 5-6-7 contain a variable named threads; set this as desired
  • run the python files 1-6 without parameters
  • run the script 7_rest.py with parameters 0-4 (e.g. for i in $(seq 0 4); do python 7_rest.py $i; done)
  • run python 8_merge.py
  • run python 9_format_and_fix.py output/merged.csv output/submission.csv

Requirements

  • Python 3.5 with standard scientific packages (pandas, numpy, scipy, etc.)
  • 16gb of RAM
  • About 40gb free space

Reference environment

We ran the models using the following python version and packages:

Python 3.5.2 (we used the conda environment)
pandas 0.22.0
numpy 1.14.0
matplotlib 2.0.2
scipy 1.0.0

About

Solution for the 2018 Spotify RecSys Challenge

License:Apache License 2.0


Languages

Language:Python 99.0%Language:Shell 1.0%