imnikhilanand / Who-Is-More-Likely-To-Buy

Uplift Modeling to identify the pursuable group of users from all the users in order to send them encouragement (in terms of coupons or other offers) to buy the product more without spending resources to convert those users who are not willing or interested to buy the product even after encouragement.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Who-Is-More-Likely-To-Buy

Uplift Modeling to identify the pursuable group of customers from all the users in order to send them encouragement (in terms of coupons or other offers) to buy the product more without spending resources to convert those users who are not willing or interested to buy the product even after encouragement.

Uplift Modeling

Introduction

A modeling technique that aims to find the subset of customers that would be most influenced by the action and create business value. Identifying these segments of user can substantially improve the returns on the inverstments made.

Let's take an example where a company wants to send marketing emails to their users. We would expect that the users whom we are sending the emails will buy more or increase conversion rate. Although this could be true for the entire population. But if we try to dig deeper in the user behaviour, we will find out that there is a segment of user who are sleeping dogs i.e. who does not get affected by the marketing emails or any kind of emailers. There could be another type of users who will definitely buy the product (or convert). They are called Sure things. There could be another segment of users who won't buy a product after receive the emails. They are called Lost Causes. And at last, there are users who will make a purchage (or conversion) on receiving the emails. They are called Pursuables.

The goal of uplift modeling is to identify these groups. We have to find persuables and make efforts for their purchase (or conversion). We should not waste our resources on Sleeping dogs and Sure Things. We would never want to bother Lost Cause.

Now since we haved to find if our promotional campaign is working or not, we can think of the problem as controlled randomized experiment where we are treating (or sending promotions) to a certain group of users i.e. the treatment group and not performing any action or treatment on another set of users i.e. control group. If we observe the the average purchase is higher in the treatment group than the control group, that means the promotion is encouraging users to purchase more. This is called Average Treatment Effect a.k.a ATE.

However, there may be certain set of users who are actually making a purchase and they are causing the overall increase in the treatment group. If through someway we are able to identify if those purusable customers ahead of time, then we woluld be able to conventrate out resources on them.

The process of determining the variable treatment effect from person to person, conditional on the different traits these people have, we are looking for indivisual treatment effect (ITE), also called the conditional average treatment effect (CATE). This is where Machine Learning and predictive power is used.

Technique

A classical technique to find Indivisual Treatment Effect is to find the indivisual likelihood, what if they were treated and not treated. These two probabilites are then subtracted to obtain the uplift: how much more likely is a purchase if the treatment is given?

The modeling can be accomplished in two ways:

  • One method is to create two instances of each data point, one with treatment = 1 and treatment = 0. This is called the 'S-Learner', approach since it is a Single model

  • Other method is to create two separate models. In the inference phase, treatment and control models are both used to obtain predictions for each instance. This approach is called 'T-Learner' approach since it uses two models.

Dataset Description

This is a synthetic dataset created for research purpose.

Synthetic Data Set for Uplift Modeling [Data set]. Zenodo. https://doi.org/10.5281/zenodo.3653141

S No.Feature NameFeature Description
1treatment_group_keyExperiment group label
2conversionOutcome variable
3x1_informativeinformative feature
4x2_informativeinformative feature
:::
11x9_informativeinformative feature
12x10_informativeinformative feature
13x11_irrelevantirrelevant feature
14x12_irrelevantirrelevant feature
:::
31x29_irrelevantirrelevant feature
32x30_irrelevantirrelevant feature

Exploratory Data Analysis

Data points in each of the two groups (Control and Treatment)

GroupsData points
Control5000
Treatment5000

Percentage of conversions in the two groups (Control and Treatment)

GroupsConversion Rate
Control0.2670
Treatment0.3712

Let's obeserve the proportion Z-test results among the two groups.

StatisticValues
Z-stat-11.17
P-value5.27e-29

Observations:

  • The data points are randomized and equally distributed among control and treatment groups.
  • The ATE (Average Treatment Effect) is positive and is approximately 10%.
  • From the proportion Z-test between the two groups, we observed that the difference in the conversion between the two groups is significant as the p-value is less than 0.05.

These observations clearly states that we can move ahead with the uplift modeling. We will be creating a machine learning model to classify users based on how likely they will be purchase the product. Once we build the model, we will use the model to estimate the difference between the conversion of indivisual to see who are likely to get converted under treatment conditions.

Modeling

S-Learner

A single model (S-Learner) was developed to predict the binary outcome (conversion). For this model XGBooost was used which resulted in an AUC score of 0.7554.

Hyperparameters

HyperparmetersValues
eta0.1
max_depth5
alpha1
gamma1

Model Performance

Model Evaluation

Now we have our uplift model, we have to test the dataset with treatment variable 0 and 1 for all the test samples. Then we have to calculate the difference for each of the data points.

To visualize the uplift score, let's plot the density function.

Observations:

  • The density function of the uplift score says that the overall uplift is positive as the curve is skewed toward the right after 0.
  • There is negative uplift too, that means there are Lost cause too in the dataset.
  • There is a major peak at 0, that means there is significant number of sleeping dogs too.

Analysis

For Uplift modeling, we cannot just rely on the AUC curve or accuracy score for the classifier. We use quantile plots for these. Quantile plots are one of the easiest way to find out of the model is working perfectly or not.

The idea is to create bins of data points based on the uplift score. Under each bin, we check if there is significant uplift or not. If the model works well, then we will observe lasrge positive difference in higher deciles. As the uplift comes down, the different will become smaller. In other words, as the uplift score increase, the true uplift of control and treatment gorups will also increase.

Let's visualize the number of data points for control and treatment under each of the quantiles.

Now let's visualize the bins itself and the average uplift in these bins.

Observations:

  • The increase in the true uplift increases with the quantile range as expected.
  • The average treatment effect was ~10%. But if we observe the uplift binwise we can see that it can go upto ~39% for certain set of users.

About

Uplift Modeling to identify the pursuable group of users from all the users in order to send them encouragement (in terms of coupons or other offers) to buy the product more without spending resources to convert those users who are not willing or interested to buy the product even after encouragement.


Languages

Language:Python 100.0%