This is the corresponding codes for WSDM CUP 2018 Music Recommendation Challenge's 1st place solution.
Please create following folders before testing:
-
input/training/source_data/
-
input/training/temporal_data/
-
input/validation/source_data/
-
input/validation/temporal_data/
-
temp_nn/
-
submission/
Put the data in the folder "source_data", then run script/run.sh, features will be extracted. For validation, you need to prepare data by hand, and create a "test_label.csv" file with "target" field.
The hyper-parameters is recorded in lgb_record.csv and nn_record.csv, you can try it directly. If everything is right, you should be able to get 0.744+ with LightGBM, and 0.742+ with 30-ensemble of NNs. 0.6 * LightGBM + 0.4 * NN should be able to get you ~0.749.
The code is tested on a small part of the data under python 2.7, if you find any bug, please contract me under the topic on Kaggle.
The versions of dependencies:
-
pandas: 0.20.1
-
sklearn: 0.18.1
-
keras: 2.0.4
-
lightgbm: 0.1
-
numpy: 1.12.1
-
scipy: 0.19.0
-
Tensorflow 1.0.1