RankFromSets

This code accompanies the RankFromSets SDSS submission.

To view the above visualization in a browser, please download this HTML file.

Environment

These experiments were conducted on a red hat linux cluster with Nvidia P100 GPUs.

Python environment, using the Anaconda python package manager:

conda env create -f environment.yml

Data format

{train, valid, test}.tsv files are observations of user, item interactions.

The item_attributes_csr.npz is a compressed sparse row format matrix of shape (n_items, n_attributes). For example, if the data is documents in a bag of words format, each row is a document and the attributes are the words.

Synthetic Example

We omit raw arXiv data and food tracking data as it is private user data.

This follows the reproducibility supplement example of a square kernel.

Generate data to /tmp/dat/simulation_%d where %d is a number from 1 to 30 replications:

export DAT=/tmp

python build_simulation_dataset.py

Launch the best-performing parameter settings with the SLURM manager for the inner product, deep, and residual regression functions:

PYTHONPATH=. python experiment/arxiv/grid.py

Hyperparameters

We find that large batch sizes significantly improve performance. See config.yml for the best-performing hyperparameters.

About

RankFromSets - SDSS submission code for reproducibility.

MIT License

Languages

Language:HTML 97.6%Language:Python 2.4%