Instacart Market Basket Analysis with SQL (SQlite3)
Which products will an Instacart consumer purchase again? Data for this project is downloaded from https://www.kaggle.com/c/instacart-market-basket-analysis/data
csv files were loaded into SQlite and new tables were created by joining loaded tables. Following the visual data analysis features were chosen for machine learning algorithms. Feature selection was performed by both SelectKBest and LASSO algorithms and showed simillar results. Top 6 fearures (order_number, 'add_to_cart_order', 'days_since_prior_order', 'order_hour_of_day', 'product_id', 'order_id') were chosen as best features for prediction of the product in the next customer's order. 8 Machine learning algorithms for classification were tested:
- Logistic Regression
- SVM_rbf
- SVM_sigmoid
- Gaussian Naive Bayes
- SVM_linear
- Decision Tree
- Random Forest
- K - Nearest Neighbors. Random forest proved to be the best algorithms for this classification task. The parameters of Random Forest were optimized and slightly better results were achived. Overall, customer's next order could be predicted with over 76% confindence.