The objective of this project is to predict whether a customer will purchase a product in a session based on their interactions with various items and categories.
The dataset consists of the following files:
train.dat
: 4,072,954 rows x 5 columns, with 1,125,000 session_idstest.dat
: 1,040,614 rows x 4 columns, with 306,825 session_idssample_submission.csv
: 100 rows x 2 columns (session_id, label)
session_id
: Unique identifier for each user sessiontimestamp
: Timestamp of the sessioncategory
: Category of items the user interacted with during the session (could be one or more)item_id_code
: Unique code for each productlabel
: Target outcome to predict - whether the user purchased in that session or not
- Problem Statement
- Data Summary
- Approach Overview
- Installation
- Imports
- Data Loader
- Exploratory Data Analysis (EDA)
- Feature Engineering/Extraction
- Feature Encoding
- Feature Selection
- Modeling
- Model Evaluation/Selection
- Hyper-parameter Tuning
- Prediction Over Test Set
- Neural Network based Modeling
This project involves preprocessing the data, conducting exploratory data analysis, engineering relevant features, encoding categorical variables, selecting appropriate features, modeling, evaluating and selecting the best performing model, tuning hyperparameters, and finally making predictions over the test set. Additionally, a neural network-based approach using Multilayer Perceptron (MLP) is also explored for prediction.