Implementation of Recformer

Operating System: Linux

Run cd pretrain_data; python3 preprocess.py -d -m -s; cd .. for pretrain data pre-processing, where -d, -m, -s are for data downloading, meta data extraction, and interaction sequence generation respectively.
Run cd finetune_data; python3 preprocess.py -d -m -s; cd .. for finetune data pre-processing, where -d, -m, -s are for data downloading, meta data extraction, and interaction sequence generation respectively.

Run bash 1-pretrain.sh to perform pre-training.
Run python 2-convert_pretrained_ckpt.py to transform the lightning framework model to torch model.
Run bash 3-finetune.sh to perform fine-tuning on 6 datasets specified in the original work.

pretrain_data/preprocess.py is to pre-process the pre-training dataset.
finetune_data/preprocess.py is to pre-process the fine-tuning datasets.
lightning_dataloader.py is the dataloader for pre-training.
dataloader.py is the dataloader for fine-tuning.
collator.py is to collects and processes data into batches and the output can be directly feed to pretraining finetuning evaluation and testing the model.
recformer/tokenization.py tokenizes the item sequences by token ids, token position ids, token type ids item position ids.
recformer/models.py is the implementation of the base model with 4 embedding layers extended from Longformer, and the models for pretraining and prediction.
lightning_litwrapper.py is the lightning module wrapper for performing easier pretraining with torch_lightning.
lightning_pretrain.py is the script to perform pretraining with torch_lightning framework.
finetune.py is to fine-tune the pretrained model.