xpert

Code for XPERT algorithm from Personalized Retrieval over Millions of Items

This folder contains the code for XPERT and the Amazon 1M dataset

Download the data and model:

Download and unzip Amazon-1M dataset as part of Amazon-1M.zip.
Download and unzip trained model as part of Amazon-1M-model.zip.

Running the inference code of XPERT on Amazon 1M data:

Set the environment variables in setup.sh file. Set CURR_DIR and PYTHONPATH with current_directory. DATA_PATH would be where the data is present (it would be CURR_DIR/data).
Run source setup.sh
We have provided a trained model in models/model_24.python, which will be used for evaluation.
Run python src/evaluate_XPERT.py configs/evaluation.yaml

Data format:

item_features.txt contains the 768-dimensional embeddings of the Amazon product titles which were exracted from a pretrained 6-layered DistilBERT base mode. The format of each row is: <item_id> <item_embedding>
final_data_test.txt and final_data_train.txt contains the test and train data respectively in the following format: <user_id> <label_time> label = List of comma separated: <product_id> which are treated as the label label_time = Timestamp of last reviewed product_id among labels history = List of space separated: <product_id>: which are the user history

feat_data_bxml and user_data_test contains binarized files extracted from the files above, and are shared for fast inference.

Cite as:

@inproceedings{vemuri2023personalized,
  title={Personalized Retrieval over Millions of Items},
  author={Vemuri, Hemanth and Agrawal, Sheshansh and Mittal, Shivam and Saini, Deepak and Soni, Akshay and Sambasivan, Abhinav V and Lu, Wenhao and Wang, Yajun and Parsana, Mehul and Kar, Purushottam and others},
  booktitle={Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  pages={1014--1022},
  year={2023}
}

Currently added features:

Amazon-1M dataset.
Inference model and scripts for XPERT.

Features to add:

Add Amazon-10M dataset also.
Add text format for both Amazon-1M and Amazon-10M dataset.
Add dataset creation scripts.
Add base embedding extractions scripts and model.
Add global interest creation (clustering) code.
Add training scripts for morph operators.

NIKE-ADIDAS / xpert