RecSys-Notes

Classic papers and resources on recommendation system, along with python implementation (focusing on PyTorch).

What should we consider about recommendation - From Netflix:

We want our recommendations to be accurate in that they are relevant to the tastes of our members, but they also need to be diverse so that we can address the spectrum of a member’s interests versus only focusing on one. We want to be able to highlight the depth in the catalog we have in those interests and also the breadth we have across other areas to help our members explore and even find new interests. We want our recommendations to be fresh and responsive to the actions a member takes, such as watching a show, adding to their list, or rating; but we also want some stability so that people are familiar with their homepage and can easily find videos they’ve been recommended in the recent past.

Covered Model & Performance

Model	Key Idea	Recommended Hyperparameter	Criteo Test AUC	Implementation
Factorization Machine	Use embedding and dot product to model low-level interaction explicitly		`0.792564` after one epoch	Paper PyTorch
Field-aware Factorization Machine	Model interactions between different fields differently
Deep Factorization Machine	Use `FM` to model low-level interaction explicitly and `DNN` to model high-level interaction implicitly	DNN: `3 * 400`	`0.801416` after two epoches	Paper PyTorch
Deep Cross Network	Use `Cross Net` to model bit-level interaction between feature embedding explicitly and `DNN` to model high-level interaction implicitly	Cross: `6` DNN: `2*1024`	`0.801345` after three epoches	Paper PyTorch
Extreme Deep Factorization Machine	Introduce `Compressed Interaction Network` to enhance Cross Net, capture feature interaction at vector level instead of bit level	CIN: `3200` DNN: `4400`	`0.804545` after two epoches	Paper Pytorch

Data Preparation

Criteo Data

Criteo data can be downloaded at Kaggle Displaying Ads Dataset, to prepare the data, do the following steps.

Git clone this repo to your local environment and change directory to your local repo
Create directory mkdir ./Data/crieto/criteo_raw_artifact
Unzip the criteo data dac.tar.gz and move train.txt and test.txt to ./Data/crieto/criteo_raw_artifact
Run the following command in shell
```
cd ./Data/crieto
python3 split.py
python3 prepare.py
```
Note that the current implementation of prepare.py will have all the prepared data stored in memory which may not be feasible for machines with small memory. A work around would be to store the prepared data in partition.

ywu94 / RecSys-Notes

RecSys-Notes

What should we consider about recommendation - From Netflix:

Covered Model & Performance

Data Preparation

Criteo Data

About

Languages