fakhrirobi/uplift_modelling

E## Background

Problem Description

An e commerce company just established. The company wants to grow their user base ? Why its important

The company is in stage 4 to scale to a bigger market. Of course expannding user base require effort. We cannot wait for the miracle to come up

How can we grow user ? To grow user base we can divide into two types

Organic Growth
Non Organic Growth

So what is the different between both

Organic Growth refer to condition where user willingfully to use our product without any intervention. That could happen because of the quality of our product offered and many more

The first type is quite rare, and taking some times.. What we can do about it ?

We can intervene user growth through some intervention. So that is the idea of Non Organic Growth

Why we need to intervene, its because :

User might not know yet our product (exposure)
User finding difficulties in finding value proposition in our product

So what kind of intervention we need to do ?

Educating your potential customer all about your product

This could be done through detailed explanation of your product and why they should use your product

Offer Incentive to potential customer

The next one is more attractive since customer, can receive trial / free product hence they will have a quick impression of the product, if the product is good the more likely they will convert / buy again

Our product has been crafted carefully, and our social media team already performed maximum, we will try the second approach by offering incentive to potential customer

Now, the question who sould we give incentive ?

Source

Our goal :

Target somebody whose likely to continue purchase our product after getting incentive

We know that individual / user may react differently to incentive, we can summary their behaviour using matrix

Treatment and Conversion Matrix

Source : Author

How can we know given a treatment user will convert ?

This is study of causal inference

Most of causal inference method usually take care of Average Treatment Effect or how the average effect when people receive a treatment to outcome variable, some methods are available such as :

Backdoor Approach :
1. Matching
2. Fixed Effect
3. Difference in Difference (DiD)
Frontdoor Approaach :
1. Regression Discontinuity Design
2. Instrumental Variable

Can we use those method for our case ? Unfortunately no. Its because we are interested in individual (heterogenous) effect or sometimes called as Conditional Treatment Effect (CATE)

Business Objective

Our business problem is to expand base user or user acquisition. Our current approach is to give treatment in terms of incentive to measure whether our treatment is success we would like to measure how many user are converted given treatment they receive. Hence, our business metrics is Conversion Rate

$\text{Conversion Rate} = \cfrac{\text{Number of user who receive treatment and Converted}} {\text{Number of user who get treatement} }$

Metrics Context

The higher the conversion rate is better.

Source

Solution

To find customer who eligible to receive incentive we can , predict uplift score of user who receive treatment, of course predicting uplift score is not the end.

After prediction we need a policy what we should do? All observation in previous experiment does have uplift value, but which percentile of uplift score we are going to select as treatment eligible customer is important

from Imgflip Meme Generator

Model Metrics

Some available metrics :

Area Under the Curve of Uplift
Area Under the Curve of Gain
Area Under the Curve of Qini

For now we will use Uplift

Data Description

recency

months since last purchase

history

value of the historical purchases

used_discount

indicates if the customer used a discount before

used_bogo

indicates if the customer used a buy one get one before

zip_code

class of the zip code as Suburban/Urban/Rural

is_referral

indicates if the customer was acquired from referral channel

channel

channels that the customer using, Phone/Web/Multichannel

offer

the offers sent to the customers, Discount/But One Get One/No Offer

conversion

customer conversion(buy or no)

Workflow

1. Data Preparation

Load Data

import pandas as pd 
import numpy as np

data_path = 'data.csv'

# read data 
treatment_data = pd.read_csv(data_path)
treatment_data.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	recency	history	used_discount	used_bogo	zip_code	is_referral	channel	offer
0	10	142.44	1	0	Surburban	0	Phone	Buy One Get One
1	6	329.08	1	1	Rural	1	Web	No Offer
2	7	180.65	0	1	Surburban	1	Web	Buy One Get One
3	9	675.83	1	0	Rural	1	Web	Discount
4	2	45.34	1	0	Urban	0	Web	Buy One Get One

treatment_data.columns

Index(['recency', 'history', 'used_discount', 'used_bogo', 'zip_code',
       'is_referral', 'channel', 'offer', 'conversion'],
      dtype='object')

Check data shapes & types

### Check Shape 
treatment_data.shape

(64000, 9)

treatment_data.dtypes

recency            int64
history          float64
used_discount      int64
used_bogo          int64
zip_code          object
is_referral        int64
channel           object
offer             object
conversion         int64
dtype: object

we can see that our variable match our data type

Data Preprocessing

Label Encoding

# channel 
treatment_data['channel'].unique()

array(['Phone', 'Web', 'Multichannel'], dtype=object)

# offer 
treatment_data['offer'].unique()

array(['Buy One Get One', 'No Offer', 'Discount'], dtype=object)

treatment_data = treatment_data.loc[treatment_data['offer'].isin(['Buy One Get One', 'No Offer'])]

treatment_data['treatment'] = treatment_data['offer'].map({
    'Buy One Get One':1, 'No Offer':0, 'Discount':1
})

treatment_encoded = pd.get_dummies(treatment_data.drop(['offer'],axis=1))

treatment_encoded

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	recency	history	used_discount	used_bogo	is_referral	conversion	treatment	zip_code_Rural	zip_code_Surburban	zip_code_Urban	channel_Multichannel	channel_Phone	channel_Web
0	10	142.44	1	0	0	0	1	0	1	0	0	1	0
1	6	329.08	1	1	1	0	0	1	0	0	0	0	1
2	7	180.65	0	1	1	0	1	0	1	0	0	0	1
4	2	45.34	1	0	0	0	1	0	0	1	0	0	1
5	6	134.83	0	1	0	1	1	0	1	0	0	1	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...
63989	10	304.30	1	1	0	1	1	0	1	0	0	0	1
63990	6	80.02	0	1	0	0	0	0	1	0	0	1	0
63991	1	306.10	1	0	1	0	1	0	1	0	0	1	0
63993	4	374.07	0	1	0	0	1	0	1	0	0	1	0
63998	1	552.94	1	0	1	0	1	0	1	0	1	0	0

42693 rows × 13 columns

Data Splitting

The data splitting process is similar to general machine learning workflow

from sklearn.model_selection import train_test_split

T_col = 'treatment'
target_col = 'conversion'
X_col = ['recency', 'history', 'used_discount', 'used_bogo', 'is_referral',
       'zip_code_Rural', 'zip_code_Surburban',
       'zip_code_Urban', 'channel_Multichannel', 'channel_Phone',
       'channel_Web']

X = treatment_encoded.drop(target_col, axis=1)
y = treatment_encoded[target_col]

train_full,test = train_test_split(treatment_encoded,test_size=0.2)
train,val = train_test_split(train_full,test_size=0.25)

train.columns

Index(['recency', 'history', 'used_discount', 'used_bogo', 'is_referral',
       'conversion', 'treatment', 'zip_code_Rural', 'zip_code_Surburban',
       'zip_code_Urban', 'channel_Multichannel', 'channel_Phone',
       'channel_Web'],
      dtype='object')

Exploratory Data Analysis

# recency 
plt.hist(train['history'])
plt.title('history')

Text(0.5, 1.0, 'history')

# recency 
plt.hist(train['recency'])
plt.title('Recency')

Text(0.5, 1.0, 'Recency')

# Discount 
plt.hist(train['used_discount'])
plt.title('Use Discount')

Text(0.5, 1.0, 'Use Discount')

plt.hist(train['used_bogo'])
plt.title('Use Buy One Get One')

Text(0.5, 1.0, 'Use Buy One Get One')

plt.hist(train['is_referral'])
plt.title('Use Referral')

Text(0.5, 1.0, 'Use Referral')

plt.hist(train['treatment'])
plt.title('Treatment (Buy One Get One)')

Text(0.5, 1.0, 'Treatment (Buy One Get One)')

plt.hist(train['channel_Multichannel'])
plt.title('channel_Multichannel')

Text(0.5, 1.0, 'channel_Multichannel')

plt.hist(train['channel_Phone'])
plt.title('channel_Phone')

Text(0.5, 1.0, 'channel_Phone')

plt.hist(train['channel_Web'])
plt.title('channel_Web')

Text(0.5, 1.0, 'channel_Web')

train.columns

Index(['recency', 'history', 'used_discount', 'used_bogo', 'is_referral',
       'conversion', 'treatment', 'zip_code_Rural', 'zip_code_Surburban',
       'zip_code_Urban', 'channel_Multichannel', 'channel_Phone',
       'channel_Web'],
      dtype='object')

plt.hist(train['zip_code_Urban'])
plt.title('zip_code_Urban')

Text(0.5, 1.0, 'zip_code_Urban')

plt.hist(train['zip_code_Surburban'])
plt.title('zip_code_Surburban')

Text(0.5, 1.0, 'zip_code_Surburban')

plt.hist(train['zip_code_Rural'])
plt.title('zip_code_Rural')

Text(0.5, 1.0, 'zip_code_Rural')

2. Modelling

Approaches :

S-Learner
T-Learner

from sklift.metrics import uplift_at_k,uplift_auc_score,uplift_by_percentile,qini_auc_score
from sklift.viz import plot_uplift_curve

Single Model Approach (S-Learner)

Source : Causal Inference for the Brave and True

The S-Learner model is a technique in uplift modeling used to understand the impact of a treatment, like a marketing campaign, on individuals' behavior. It works by building a predictive model that considers whether each person received the treatment or not and then estimates how this treatment affected their response. By comparing the predicted outcomes of those who received the treatment to those who didn't, it calculates an uplift score, showing the incremental effect of the treatment

The S- Learner model is actually general machine learning model such as LogisticRegression, SVM, Linear Regression and etc. however the model purpose now is to predict the response / outcome variable with or without treatment to calculate uplift score we just measure the difference between prediction if treatment variable ==1 / yes and no =0

Training Process :

Train Model like general machine learning --> can be framed as regression or classification

if our outcomes is --> binary --> classification -if our outcomes is --> continuous --> regression

Prediction Process : To Measure Conditional Average Treatment Effect we simply

Predict using the model on test data given if treatment == 0 (No Treatment)
Predict using the model on test data given if treatment == 1 (Treatment)
Compare the differences

# create dataframe to add score 
model_comparison = pd.DataFrame()
model_name = []
auc_uplift = []

from sklift.models import SoloModel


slearner = SoloModel(
    estimator=xgb.XGBClassifier(),
    method='treatment_interaction'
)
slearner = slearner.fit(
    train[X_col], train[target_col], train[T_col],

)

uplift_slearner = slearner.predict(val[X_col])

slearner_uplift = uplift_auc_score(y_true=val[target_col], uplift=uplift_slearner, treatment=val[T_col])

model_name.append('slearner')
auc_uplift.append(slearner_uplift)

2023-09-10 21:59:17,315 [18308] WARNING  py.warnings:109: [JupyterRequire] c:\users\fakhri robi aulia\appdata\local\programs\python\python38\lib\site-packages\xgboost\sklearn.py:1224: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
  warnings.warn(label_encoder_deprecation_msg, UserWarning)



[21:59:17] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.0/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.

Two Models Approach

Source : Causal Inference for the Brave and True

The T-Learner model in uplift modeling involves a two-step approach. First, it builds a model (Model T) to predict who receives a treatment based on individuals' characteristics. Second, it builds another model (Model Y) to predict individual outcomes, considering the treatment. By comparing these predictions, it calculates an uplift score, which shows the incremental effect of the treatment on each person

Training Process : During training process we trained 2 models, one for treatment=yes, and the other is for training = no

Prediction Process : To Measure Conditional Average Treatment Effect we simply

Predict using 1st model
Predict using 2nd model
Uplift = predicted_values 1st model - predicted_values 2nd model

from sklift.models import TwoModels




t_learner = TwoModels(
    estimator_trmnt=xgb.XGBClassifier(eval_metric="auc"), 
    estimator_ctrl=xgb.XGBClassifier(eval_metric="auc"), 
    method='vanilla'
)
t_learner = tm.fit(
    train[X_col], train[target_col], train[T_col],

)

uplift_tm = t_learner.predict(val[X_col])

t_learner_uplift = uplift_auc_score(y_true=val[target_col], uplift=uplift_tm, treatment=val[T_col])

model_name.append('tlearner')
auc_uplift.append(t_learner_uplift)

Modelling Summary

model_comparison['model_name'] = model_name 
model_comparison['auc_uplift'] = auc_uplift

model_comparison

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	model_name	auc_uplift
0	slearner	0.040043
1	tlearner	0.030525

We can see that S-Learner model perform better even with more simple model

Evaluating Uplift Model

We are going to evaluate on test data

uplift_slearner_test = slearner.predict(test[X_col])

slearner_test_uplift = uplift_auc_score(y_true=test[target_col], uplift=uplift_slearner_test, treatment=test[T_col])

slearner_test_uplift

0.01040496699225714

pd.DataFrame({
    'phase' : ['validation_set','test_set'],
    'uplift_auc' :[slearner_uplift,slearner_test_uplift]
})

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	phase	uplift_auc
0	validation_set	0.040043
1	test_set	0.010405

Uplift Curve

import matplotlib.pyplot as plt

plt.hist(uplift_slearner_test,bins=50)
plt.xlabel('Uplift score')
plt.ylabel('Number of observations in test set')

Text(0, 0.5, 'Number of observations in test set')

plt.hist(uplift_slearner,bins=50)
plt.xlabel('Uplift score')
plt.ylabel('Number of observations in val set')

Text(0, 0.5, 'Number of observations in val set')

from sklift.viz import plot_uplift_curve

uplift_disp = plot_uplift_curve(
    test[target_col], uplift_slearner_test, test[T_col],
    perfect=True, name='Model name'
);

We can see that our uplift score mostly concentrated at 0, meaning that our intervention / treatment does not change conversion much

Policy

After looking at the uplift result looks like close to random approach, modelling result may not always be better than baseline, so at the time if we decide to choose random approach

Key Take Aways

References

Uplift Modeling for Multiple Treatments with Cost Optimization, Zhenyu Zao and Totte Harinen (Uber)
https://www.slideshare.net/PierreGutierrez2/introduction-to-uplift-modelling?from_search=0
https://www.slideshare.net/PierreGutierrez2/beyond-churn-prediction-an-introduction-to-uplift-modeling?from_search=2

fakhrirobi / uplift_modelling

Problem Description

Business Objective

Solution

Model Metrics

Data Description

Workflow

1. Data Preparation

Load Data

Check data shapes & types

Data Preprocessing

Label Encoding

Data Splitting

Exploratory Data Analysis

2. Modelling

Single Model Approach (S-Learner)

Two Models Approach

Modelling Summary

Evaluating Uplift Model

Uplift Curve

Policy

Key Take Aways

References

About

Languages