ywu94 / RecSys-Notes

Recommendation System learning notes, including classic papers, algorithm implementations, modeling tricks, and notes.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RecSys-Notes

Classic papers and resources on recommendation system, along with python implementation (focusing on PyTorch).

What should we consider about recommendation - From Netflix:

We want our recommendations to be accurate in that they are relevant to the tastes of our members, but they also need to be diverse so that we can address the spectrum of a member’s interests versus only focusing on one. We want to be able to highlight the depth in the catalog we have in those interests and also the breadth we have across other areas to help our members explore and even find new interests. We want our recommendations to be fresh and responsive to the actions a member takes, such as watching a show, adding to their list, or rating; but we also want some stability so that people are familiar with their homepage and can easily find videos they’ve been recommended in the recent past.

Covered Model & Performance

Model Key Idea Recommended Hyperparameter Criteo Test AUC Implementation
Factorization Machine Use embedding and dot product to model low-level interaction explicitly 0.792564 after one epoch Paper
PyTorch
Field-aware Factorization Machine Model interactions between different fields differently
Deep Factorization Machine Use FM to model low-level interaction explicitly and DNN to model high-level interaction implicitly DNN: 3 * 400 0.801416 after two epoches Paper
PyTorch
Deep Cross Network Use Cross Net to model bit-level interaction between feature embedding explicitly and DNN to model high-level interaction implicitly Cross: 6
DNN: 2*1024
0.801345 after three epoches Paper
PyTorch
Extreme Deep Factorization Machine Introduce Compressed Interaction Network to enhance Cross Net, capture feature interaction at vector level instead of bit level CIN: 3*200
DNN: 4*400
0.804545 after two epoches Paper
Pytorch

Data Preparation

Criteo Data

Criteo data can be downloaded at Kaggle Displaying Ads Dataset, to prepare the data, do the following steps.

  • Git clone this repo to your local environment and change directory to your local repo

  • Create directory mkdir ./Data/crieto/criteo_raw_artifact

  • Unzip the criteo data dac.tar.gz and move train.txt and test.txt to ./Data/crieto/criteo_raw_artifact

  • Run the following command in shell

    cd ./Data/crieto
    python3 split.py
    python3 prepare.py

    Note that the current implementation of prepare.py will have all the prepared data stored in memory which may not be feasible for machines with small memory. A work around would be to store the prepared data in partition.

About

Recommendation System learning notes, including classic papers, algorithm implementations, modeling tricks, and notes.

License:MIT License


Languages

Language:Python 84.9%Language:Jupyter Notebook 15.1%