kirajano / kaggle_rossmann

The repo for Data Science Retreat of the Rossmann Sales Predictions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

image image

In this project, the Kaggle Rossman challenge is being taken on. The goal is to predict the Sales of a given store on a given day. Model performance is evaluated on the root mean square percentage error (RMSPE).

image

Please install the requirements.txt to reproduce the results:

pip install -r requirements.txt

It is recommended to use a new environment before running the pip install.

The dataset consists of three csv files: store.csv, train.csv and holdout.csv

Data Files:

  • train.csv holds info about each store.
  • store.csv holds the sales info per day for each store.
  • holdout. csv holds "unseen" data that the model is going to be evaluated on

The repo contains main.py that runs the main script from step one until the end.

The script can be run after cloning since all data used is in the repo.

Notes on main.py:

By default, the hyperparameter section is uncommented due to long completion time. The other sections will print out intermediate results for demonstration purposes on how the score looks like applying different models. The script can be ran individually and the last print out will be the RMSPE for the predictions of the holdout set. This is the final evalution method for the challenge.

The function.py file contains utility functions that are called in main.py.

The rossman_model.sav contains the pickled hypertuned model with the least RMSPE.

About

The repo for Data Science Retreat of the Rossmann Sales Predictions


Languages

Language:Python 100.0%