cdavilarios / entity-embedding-rossmann

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This is the code used in the paper "Entity Embeddings of Categorical Variables". If you want to get the original version of the code used for the Kaggle competition, please use the Kaggle branch.

To run the code one needs first download and unzip the train.csv and store.csv files on Kaggle and put them in this folder.

The following packages are needed if you want to recover the result in the paper (we used python 3):

pip3 install -U scikit-learn
pip3 install -U xgboost
pip3 install keras==1.2.2

Please refer to Keras for more details regarding how to install keras. Note that the code used keras 1.x API so make sure to install the right version of keras as shown above.

Next, run the following scripts to extract the csv files and prepare the features:

python3 extract_csv_files.py
python3 prepare_features.py

To run the models:

python3 train_test_model.py

You can anaylize the embeddings with the ipython notebook included. This is the learned embeeding of German States printed in 2D (with the Kaggle branch):

and this is the learned embeddings of 1115 Rossmann stores printed in 3D:

About


Languages

Language:Jupyter Notebook 96.8%Language:Python 3.2%