rbnuria / Purchase100-dataset

A repo to download and preprocess the Purchase100 dataset extracted from Kaggle: Acquire Valued Shoppers Challenge

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Purchase100 dataset

A repo to preprocess the Purchase100 dataset extracted from Kaggle: Acquire Valued Shoppers Challenge

Download preprocessed dataset

The authenticity of the downloaded files can be checked with the following md5 hashes

  • purchase100.npz : 0d7538b9806e7ee622e1a252585e7768

Steps to preprocess the dataset

  1. Download the transactions.csv.gz file from https://www.kaggle.com/c/acquire-valued-shoppers-challenge/data
  2. Run preprocess_dataset.py <path_to_transactions.csv.gz> to generate purchase100.npz

How to use it

   data = np.load('./purchase100.npz')
   features = data['features']
   labels = data['labels']

Requirements

This work is tested with Python 3.8.5.

The requirements.txt file is automatically generated with pipreqs.

References

The code in this repo is based on the preprocessing scripts given in https://github.com/bargavj/EvaluatingDPML

About

A repo to download and preprocess the Purchase100 dataset extracted from Kaggle: Acquire Valued Shoppers Challenge

License:MIT License


Languages

Language:Python 100.0%