The project was part of the 2487-S2 Machine Learning course for the MSc in Business Analytics taught at Nova School of Business and Economics. The topic and scope of the project could be freely chosen by the students, based on given datasets.
- Frederik SΓΈegaard - 44898
- Lennart Max Oser - 44379
- Niclas Frederic Sturm - 45914
This repository contains the prototype of a product recommender based on data from online grocer Instacart.
The goal was to first identify a business problem faced by e-commerce comapnies such as Instacart, second explore the avaialble data to get an understaning of what we can work with and then finally prototype a product recommendation engine based on the products in the basket of a used. In addition to the jupyter notebooks, we also created a Command Line Interface (CLI) to play around with our built recommendation engine. On top of that, we also created an API to demonstrate how such an engine could be used as a Microservice within a company (i.e. Instacart).
We divided the project in total of 6 parts numbered from 0
to 5
. Additionally, there is a data folder which has to be created following the instructions below. Here you find an overview of the strucure:
βββ 0_Introduction # containing the business to ML problem part
β βββ 0_Introduction.ipynb
βββ 1_Exploratory_Data_Analysis # classical EDA based on the six available data sets
β βββ 1_exploratory_data_analysis.ipynb
βββ 2_Clustering # containing the feature engineering, a PCA and the actual clustering alorithm
β βββ 2_clustering.ipynb
βββ 3_Item2Vec # containing the Item2Vec alogrhitm and the testing of the recommender engine
β βββ 3_0_Item2Vec.ipynb
β βββ 3_1_Recommendation_Testing.ipynb
βββ 4_Command_Line_Interface # containting the python file for CLI handling
β βββ CLI_Specification.md
β βββ recommend_me_something.py
βββ 5_Recommender_API # contatining the API
β βββ API_Specification.md
β βββ engine
β β βββ recommender_engine.py
β βββ recommender_api.py
βββ data # data folder with all the requried data files
β βββ aisles.csv
β βββ departments.csv
β βββ order_products__prior.csv
β βββ order_products__train.csv
β βββ orders.csv
β βββ products.csv
β βββ sample_submission.csv
βββ environment.yml
βββ README.md
In order to run the code in the same environment as we did please create a virtual environment running the command conda env create -f environment.yml
.
After doing so, you should be able to choose the new environment called instacart
in your preferred IDE.
- In your CLI run
mkdir data
or manually create a folder calleddata
- Run
cd data
in your CLI to get in the right directory - Now run the following command to download the data
kaggle competitions download -c instacart-market-basket-analysis
. If you prefer to manually download the data click here - Extract the zip files using the CLI or what ever method you prefer