roccurve lgbm kmeans-clustering personas

Customer_Segmentation

An automobile company has plans to enter new markets with their existing products.

In their existing market, the sales team has classified all customers into 4 segments (A, B, C, D ). Then, they performed segmented outreach and communication for different segment of customers.

I'm required to help the manager to predict the right group of the new customers.

Goals of the project

Perform an Exploratory Data Analysis with visualization
Use the supervised machine learning models to predict the customer segmentation.
Use unsupervised learning model K Nearest Neighbors to create new clusters.
Create Personas of the new clusters.

Tools used

Pandas
Numpy
Matplotlib
Seaborn
Time
scikit-learn
- Supervised learning models for classfication:
  - Support Vector Machine
  - Gradient Boosting Classifier
  - Light Gradient Boosting Classifier
  - Ada Boost
  - Cat Boost
  - Decision Tree
  - Random Forest
  - Logistic Regression
  - KNeighbors
  - Naive Bayes Gaussian
- Unsupervised Learning model for clustering:
  - K-Means

Resources

Customer Segmentation https://www.kaggle.com/vetrirah/customer

Classification models performance

The best performers modesl are:

Gradient Boosting Classifier:
- Accuaracy: 53.62% - Preccision: 52.77% - F1 score: 53% - Recall: 52.49%
Light Gradient Boosting Classifier:
- Accuaracy: 52% - Preccision: 49.86% - F1 score: 49.75% - Recall: 50%

K-Means cluster Personas creation

After clustering the datapoints in four clusters I came up with the below Personas:

Process

Exploratory Data Analysis
- Clean the dataset
- Create visualizations
Feature Engineering
- Create Dummies.
- Scaling
- Feature selection SFS and RFE
- PCA
Modeling for Classification. we select the model with better performance (Gradient Boosting Classifier - Light Gradient Boosting Classifier)
Hyper Tunning of the Gradient Boosting Classifier and Light Gradient Boosting Classifier.
Clustering with Kmeans
Create Personas
Create Story telling

Presentation

To see the presentation, click in the below picture.

About

Explorartory Data Analysis with visualization. Use the supervised machine learning models to predict the customer segmentation. Use unsupervised learning model K Nearest Neighbors to create new clusters. Create Personas of the new clusters.

roccurve lgbm kmeans-clustering personas

Languages

Language:Jupyter Notebook 100.0%