zakharovas / RecSys2018

MIPT_MSU team RecSys Challenge 2018 solution

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RecSys2018

MIPT_MSU team RecSys Challenge 2018 solution

Requirements

We used Python3.5

Install requirements from requirements.txt

You will also need Catboost, Starspace, Vowpal Wabbit and Python Transformer

Creating solution

All scripts are started from RecSys2018/recsys

For all scripts except recsys_script.sh you should activate virtualenv externaly in your bash session

  1. In RecSys2018 folder create splitted_data folder and put million playlist dataset there (RecSys2018/splitted_data/raw)

  2. Put challenge set into splitted_data folder (RecSys2018/splitted_data/challenge_set.json)

  3. Encode million playlist dataset and challenge set

    bash recsys_script.sh --encoding 
  4. Train iALS and Starspace

    bash recsys_script.sh --update_models 
  5. Train name iALS

    bash train_nals.sh 
  6. Train SVD++

    bash train_svd_pp.sh
  7. Train Vowpal Wabbit

    bash train_vw.sh
  8. Create example files for Catboost

    bash create_examples.sh
  9. Create pools for Catboost from examples

    bash test_to_vw.sh
    bash vw_predict_train2.sh
    bash add_vw_t2.sh
    bash feature.sh
  10. Train Catboost

    bash cb.sh ~/catboost
  11. Create candidates for challenge set

    bash recsys_script.sh --update_candidates --train ../splitted_data/encoded_train.json --test ../splitted_data --test_dir
  12. Predict with Vowpal Wabbit model

    bash wv_on_unk_test.sh  ../splitted_data
    bash vw_predict.sh  ../splitted_data
    bash add_vw.sh ../splitted_data
  13. Apply trained models

    bash recsys_script.sh --apply --train ../splitted_data/encoded_train.json --test ../splitted_data --test_dir
  14. Decode created solution

    python utils/create_solution.py ../splitted_data/test_c_predictions \
                                    ../splitted_data/tracks.json \
                                    ../MIPT_MSU_solution.csv

Usage recommendations

  • With recsys_script.sh you may set path to Starspace binary with --starspace_path option. To your python virtualenv with --env option.

  • Path to catboost binary you may set as argument to cb.sh.

  • You will need about 100GB RAM

  • Most of our programs creates 32 threads

  • We recommend you to train Catboost on GPU, beause it takes several hours instead of days.

About

MIPT_MSU team RecSys Challenge 2018 solution

License:Apache License 2.0


Languages

Language:Python 86.9%Language:Shell 13.1%