Code for XPERT algorithm from Personalized Retrieval over Millions of Items
This folder contains the code for XPERT and the Amazon 1M dataset
- Download and unzip Amazon-1M dataset as part of Amazon-1M.zip.
- Download and unzip trained model as part of Amazon-1M-model.zip.
-
Set the environment variables in setup.sh file. Set CURR_DIR and PYTHONPATH with current_directory. DATA_PATH would be where the data is present (it would be CURR_DIR/data).
-
Run
source setup.sh
-
We have provided a trained model in models/model_24.python, which will be used for evaluation.
-
Run
python src/evaluate_XPERT.py configs/evaluation.yaml
-
item_features.txt contains the 768-dimensional embeddings of the Amazon product titles which were exracted from a pretrained 6-layered DistilBERT base mode. The format of each row is: <item_id> <item_embedding>
-
final_data_test.txt and final_data_train.txt contains the test and train data respectively in the following format: <user_id> <label_time> label = List of comma separated: <product_id> which are treated as the label label_time = Timestamp of last reviewed product_id among labels history = List of space separated: <product_id>: which are the user history
feat_data_bxml and user_data_test contains binarized files extracted from the files above, and are shared for fast inference.
@inproceedings{vemuri2023personalized,
title={Personalized Retrieval over Millions of Items},
author={Vemuri, Hemanth and Agrawal, Sheshansh and Mittal, Shivam and Saini, Deepak and Soni, Akshay and Sambasivan, Abhinav V and Lu, Wenhao and Wang, Yajun and Parsana, Mehul and Kar, Purushottam and others},
booktitle={Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval},
pages={1014--1022},
year={2023}
}
- Amazon-1M dataset.
- Inference model and scripts for XPERT.
- Add Amazon-10M dataset also.
- Add text format for both Amazon-1M and Amazon-10M dataset.
- Add dataset creation scripts.
- Add base embedding extractions scripts and model.
- Add global interest creation (clustering) code.
- Add training scripts for morph operators.