tokakhaled / Instacart-Market-Basket-Analysis

Recommender system that predicts your next order based on your previous purchases. Also, it discuss the association between product purchases.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Instacart-Market-Basket-Analysis πŸ›’

graph

Project Description

  • This project exploits the power of data analysis and machine learning to take business to the next level!
  • It's one of the competiting projects in Data Science - Challenge Round 2 Hosted by Dr. Doaa Mahmoud.
  • The project consist of 3 sections
    • EDA and Interactive Dashboard by Power BI
    • Predictive Analysis Model
    • Association Rules

Open directly from kaggle Notebooks!

EDA and Interactive Dashboard by Power BI πŸ“Š

  • General Analysis.
  • Analyzing Behavour of users: users who always order same products.
  • How Time affects the purchasing behaviour of customers?
  • Analyzing products
  • Analyzing Organic Prodcuts.
  • Purchasing behaviour on Departments and Aisles.
graph
graph
graph
graph

Next Order Recommender/Predictor 🍞 🍟 🍩

A predictive analysis model , that predicts the products ordered in users' future order based on each purchasing history. Primary Key is the user-product pair to predict whether will be in the future order or not.

XGBoost Classifier was used.

Features with highest importance used by the model:

  • up_orders_since_last_order: measures how long the user hasn't considered buying a specific product.
  • up_order_rate_since_first_time: measures the degree a user like a product. It's the ratio by which a user will buy a product from the first moment he/she knew about it.
  • prod_reorder_ratio: measures how customers in general like a product.
  • user_reorder_ratio: measures how this user is likely to buy something new! Feature Importance

Association Rules 🍌 ➑️ πŸ…

Project Challenges πŸ™€

Next Order Recommender

  • Data is sparse, we have very large number of products and of course the customer will have very few in his/her next order. Data is very skewed to the negative class. Class distribution: 90% negative class, 10% positive class.
  • First, we've found that there's alot of false negatives, do We changed the threshold to maximize the recall, while keeping the precision above a certain threshold [0.3].
  • In ther words, we wanted to reduce, the false negatives, the number of products the model say user won't predict in the future while he/she will actually does. On the other side, it's okay to allow some false positives, when the model recommends a products the user will less likely buy in his/her next order.

The PR-Curve

Project Organization

./
β”œβ”€β”€ EDA
|     β”œβ”€β”€ eda-on-instacart-data.ipynb 
|         
β”œβ”€β”€ Model
|     |                                              
|     └── predictive-analysis-model.ipynb 
└── Business Insights 
      β”œβ”€β”€ Business Questions-Solutions.pdf                                              
      └── Project-Data Description.pdf

Todos

  • Applying dynamic thresholding on user's products according to his/her average basket size.
  • Applying seperate prediction on strong behaviour users.
  • Deploy the Model.

Contributors

Toka Khaled
Toka Khaled
Noran Hany
Noran Hany

About

Recommender system that predicts your next order based on your previous purchases. Also, it discuss the association between product purchases.


Languages

Language:Jupyter Notebook 100.0%