DeepLight: Deep Lightweight Feature Interactions

Deploying the end-to-end deep factorization machines has a critical issue in prediction latency. To handle this issue, we study the acceleration of the prediction by conducting structural pruning for DeepFwFM, which ends up with 46X speed-ups without sacrifice of the state-of-the-art performance on Criteo dataset.

Please refer to the arXiv paper if you are interested in the details.

Original paper:

@inproceedings{deeplight,
  title={DeepLight: Deep Lightweight Feature Interactions for Accelerating CTR Predictions in Ad Serving},
  author={Wei Deng and Junwei Pan and Tian Zhou and Deguang Kong and Aaron Flores and Guang Lin},
  booktitle={International Conference on Web Search and Data Mining (WSDM'21)},
  year={2021}
}

In this repository additional model compression and acceleration will be conducted. All on the Twitter dataset given by the RecSys 2020 Challenge.

Environment

Python 3.7
PyTorch 1.7.1
Pandas
Sklearn
https://github.com/TylerYep/torch-summary

Input Format

This implementation requires the input data in the following format:

Xi: [[ind1_1, ind1_2, ...], [ind2_1, ind2_2, ...], ..., [indi_1, indi_2, ..., indi_j, ...], ...]
- indi_j is the feature index of feature field j of sample i in the dataset
Xv: [[val1_1, val1_2, ...], [val2_1, val2_2, ...], ..., [vali_1, vali_2, ..., vali_j, ...], ...]
- vali_j is the feature value of feature field j of sample i in the dataset
- vali_j can be either binary (1/0, for binary/categorical features) or float (e.g., 10.24, for numerical features)
y: target of each sample in the dataset (1/0 for classification, numeric number for regression)

How to run the dense models

The folder already has a tiny dataset to test. You can run the following models through

LR: logistic regression

$ python main_all.py -use_fm 0 -use_fwfm 0 -use_deep 0 -use_lw 0 -use_logit 1 > ./logs/all_logistic_regression

FM: factorization machine

$ python main_all.py -use_fm 1 -use_fwfm 0 -use_deep 0 -use_lw 0 > ./logs/all_fm_vanilla

FwFM: field weighted factorization machine

$ python main_all.py -use_fm 0 -use_fwfm 1 -use_deep 0 -use_lw 0 > ./logs/all_fwfm_vanilla

DeepFM: deep factorization machine

$ python main_all.py -use_fm 1 -use_fwfm 0 -use_deep 1 -use_lw 0 > ./logs/all_deepfm_vanilla

NFM: factorization machine

$ python NFM.py > ./logs/all_nfm

xDeepFM: extreme factorization machine

You may try the link here https://github.com/Leavingseason/xDeepFM

How to conduct structural pruning

The default code gives 0.8123 AUC if apply 90% sparsity on the DNN component and the field matrix R and apply 40% (90%x0.444) on the embeddings.

python main_all.py -l2 6e-7 -n_epochs 10 -warm 2 -prune 1 -sparse 0.90  -prune_deep 1 -prune_fm 1 -prune_r 1 -use_fwlw 1 -emb_r 0.444 -emb_corr 1. > ./logs/deepfwfm_l2_6e_7_prune_all_and_r_warm_2_sparse_0.90_emb_r_0.444_emb_corr_1

Useful python scripts

Using Twitter dataset

python main_all.py -use_fm 0 -use_fwfm 1 -use_deep 1 -use_lw 1 -use_fwlw 1 -use_cuda 1 -n_epochs 1 -dataset twitter -twitter_category like

Pruning

python main_all.py -use_fm 0 -use_fwfm 1 -use_deep 1 -use_lw 1 -n_epochs 10 -dataset tiny-criteo -use_cuda 1 -prune 1 -l2 6e-7 -warm 2 -sparse 0.9 -prune_deep 1 -prune_fm 1 -prune_r 1 -use_fwlw 1 -emb_r 0.444 -emb_corr 1.

QR Embeddings

python main_all.py -use_fm 0 -use_fwfm 1 -use_deep 1 -use_lw 1 -use_fwlw 1 -use_cuda 1 -n_epochs 3 -dataset criteo -embedding_bag 1 -qr_flag 1

Quantization for sparse models

python quantization.py -use_deep 1 -use_fwfm 1 -n_epochs 3 -prune 1 -sparse 0.90 -use_fwlw 1 -save_model_path ./saved_models/full_pruned_DeepFwFM_l2_6e-07_sparse_0.9_seed_0 -dynamic_quantization 0 -quantization_aware 0 -static_quantization 1

Quantization for QR Embeddings

python quantization.py -use_deep 1 -use_fwfm 1 -use_lw 1 -use_fwlw 1 -n_epochs 3 -save_model_path ./saved_models/full_DeepFwFM_l2_3e-07_qr -dynamic_quantization 0 -quantization_aware 0 -static_quantization 1 -embedding_bag 1 -qr_flag 1

Preprocess full Twitter dataset

To download the full dataset, you can use the link below https://recsys-twitter.com/

For preprocessing use this repository: https://github.com/pintonos/deeplearning/tree/main/RecSys2020/01_Preprocess

It contains preprocessing according to the RecSys2020 winner features by RapidsAI.

Move the preprocessed files to the ./data/large folder

Move to the data folder and process the raw data.

$ python preprocess_twitter.py

Preprocess full Criteo dataset

The Criteo dataset has 2-class labels with 22 categorical features and 11 numerical features.

To download the full dataset, you can use the link below http://labs.criteo.com/2014/02/kaggle-display-advertising-challenge-dataset/

Unzip the raw data and save it in ./data/large folder:

tar xvzf dac.tar.gz

Move to the data folder and process the raw data.

$ python preprocess_criteo.py

When the dataset is ready, you need to change the files in main_all.py as follows

#result_dict = data_preprocess.read_data('./data/tiny_train_input.csv', './data/category_emb', criteo_num_feat_dim, feature_dim_start=0, dim=39)
#test_dict = data_preprocess.read_data('./data/tiny_test_input.csv', './data/category_emb', criteo_num_feat_dim, feature_dim_start=0, dim=39)
result_dict = data_preprocess.read_data('./data/large/train.csv', './data/large/criteo_feature_map', criteo_num_feat_dim, feature_dim_start=1, dim=39)
test_dict = data_preprocess.read_data('./data/large/valid.csv', './data/large/criteo_feature_map', criteo_num_feat_dim, feature_dim_start=1, dim=39)

How to analyze the prediction latency

You need to download this repo: https://github.com/uestla/Sparse-Matrix before you start.

After the setup, you can change the directory in line-23 of the cpp file to your local dir.

cd latency
g++ criteo_latency.cpp  -o criteo.out

To avoid setting the environment, you can also consider to test the compiled file directly.

./criteo.out

Acknowledgement

https://github.com/nzc/dnn_ctr

pintonos / xsDeepFwFM_deprecated