LMO94 / instacart-product-recommender

This repository contains the prototype of a product recommender based on data from online grocer Instacart. It was created as a group project for the Machine Learning Course for MSc Business Analytics at Nova School of Business and Economics.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

πŸ₯• Instacart Product Recommender

ℹ️ General Information

The project was part of the 2487-S2 Machine Learning course for the MSc in Business Analytics taught at Nova School of Business and Economics. The topic and scope of the project could be freely chosen by the students, based on given datasets.

πŸ‘¨β€πŸ’» Group members

  • Frederik SΓΈegaard - 44898
  • Lennart Max Oser - 44379
  • Niclas Frederic Sturm - 45914

πŸ’‘ About the project

This repository contains the prototype of a product recommender based on data from online grocer Instacart.

The goal was to first identify a business problem faced by e-commerce comapnies such as Instacart, second explore the avaialble data to get an understaning of what we can work with and then finally prototype a product recommendation engine based on the products in the basket of a used. In addition to the jupyter notebooks, we also created a Command Line Interface (CLI) to play around with our built recommendation engine. On top of that, we also created an API to demonstrate how such an engine could be used as a Microservice within a company (i.e. Instacart).

πŸ”Ž Files overview

We divided the project in total of 6 parts numbered from 0 to 5. Additionally, there is a data folder which has to be created following the instructions below. Here you find an overview of the strucure:

β”œβ”€β”€ 0_Introduction                         # containing the business to ML problem part
β”‚   └── 0_Introduction.ipynb
β”œβ”€β”€ 1_Exploratory_Data_Analysis            # classical EDA based on the six available data sets  
β”‚   └── 1_exploratory_data_analysis.ipynb
β”œβ”€β”€ 2_Clustering                           # containing the feature engineering, a PCA and the actual clustering alorithm
β”‚   └── 2_clustering.ipynb
β”œβ”€β”€ 3_Item2Vec                             # containing the Item2Vec alogrhitm and the testing of the recommender engine
β”‚   β”œβ”€β”€ 3_0_Item2Vec.ipynb
β”‚   └── 3_1_Recommendation_Testing.ipynb
β”œβ”€β”€ 4_Command_Line_Interface               # containting the python file for CLI handling
β”‚   β”œβ”€β”€ CLI_Specification.md
β”‚   └── recommend_me_something.py
β”œβ”€β”€ 5_Recommender_API                      # contatining the API
β”‚   β”œβ”€β”€ API_Specification.md
β”‚   β”œβ”€β”€ engine
β”‚   β”‚   └── recommender_engine.py
β”‚   └── recommender_api.py
β”œβ”€β”€ data                                   # data folder with all the requried data files
β”‚   β”œβ”€β”€ aisles.csv
β”‚   β”œβ”€β”€ departments.csv
β”‚   β”œβ”€β”€ order_products__prior.csv
β”‚   β”œβ”€β”€ order_products__train.csv
β”‚   β”œβ”€β”€ orders.csv
β”‚   β”œβ”€β”€ products.csv
β”‚   └── sample_submission.csv
β”œβ”€β”€ environment.yml
└── README.md

πŸ’» Usage

In order to run the code in the same environment as we did please create a virtual environment running the command conda env create -f environment.yml.

After doing so, you should be able to choose the new environment called instacart in your preferred IDE.

To download the data run the following steps:

  1. In your CLI run mkdir data or manually create a folder called data
  2. Run cd data in your CLI to get in the right directory
  3. Now run the following command to download the data kaggle competitions download -c instacart-market-basket-analysis. If you prefer to manually download the data click here
  4. Extract the zip files using the CLI or what ever method you prefer

About

This repository contains the prototype of a product recommender based on data from online grocer Instacart. It was created as a group project for the Machine Learning Course for MSc Business Analytics at Nova School of Business and Economics.


Languages

Language:Jupyter Notebook 99.7%Language:Python 0.3%