Naive Bayes classifiers are a family of probabilistic classifiers based on Bayes' theorem, with an assumption that each feature is independent from one another. Previously we looked at the learning mechanism of this classifier in terms of maximizing the posterior probability. In this lab we shall learn to code a Gaussian Naive Bayes Classifier from scratch , and also learn to use scikitlearn library for this task.
You will be able to:
- Build a Naive Bayes Classifier in Python and Numpy to make predictions on unseen data
Below we shall attempt to build a naive Bayes classifier in using numpy calculations only. Python offers sophisticated implementations of this algorithm in SciKitLearn which we shall look at in the following lesson. Here we will use the equations we have learned so far, and put them into action for a very simple example.
Let's work with a small toy-data with continuous features (height, weight, foot size) and a target variable (Person: male or a female). We would work on building a classifier that can learn the joint probability of data and the target variables and classify a new example as a male of female.
Note : You may also use a multinomial distribution for footsize (categrocial). Let's just assume they are all continuous for now.
import numpy as np
import pandas as pd
data = None
data
# Person height weight foot size
# 0 male 6.00 180 12
# 1 male 5.92 190 11
# 2 male 5.58 170 12
# 3 male 5.92 165 10
# 4 female 5.00 100 6
# 5 female 5.50 150 8
# 6 female 5.42 130 7
# 7 female 5.75 150 9
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Person | height | weight | foot size | |
---|---|---|---|---|
0 | male | 6.00 | 180 | 12 |
1 | male | 5.92 | 190 | 11 |
2 | male | 5.58 | 170 | 12 |
3 | male | 5.92 | 165 | 10 |
4 | female | 5.00 | 100 | 6 |
5 | female | 5.50 | 150 | 8 |
6 | female | 5.42 | 130 | 7 |
7 | female | 5.75 | 150 | 9 |
So a very small dataset, but this will help us better understand how the classifier works in more detail. The results surely wont be groundbreaking. We can see that gender is shown as strings male/female . Let's change "male" to 0 and "female" to 1 and make a binary categorical variable.
# Subset data and assign 0 and 1
data
# Person height weight foot size
# 0 0 6.00 180 12
# 1 0 5.92 190 11
# 2 0 5.58 170 12
# 3 0 5.92 165 10
# 4 1 5.00 100 6
# 5 1 5.50 150 8
# 6 1 5.42 130 7
# 7 1 5.75 150 9
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Person | height | weight | foot size | |
---|---|---|---|---|
0 | 0 | 6.00 | 180 | 12 |
1 | 0 | 5.92 | 190 | 11 |
2 | 0 | 5.58 | 170 | 12 |
3 | 0 | 5.92 | 165 | 10 |
4 | 1 | 5.00 | 100 | 6 |
5 | 1 | 5.50 | 150 | 8 |
6 | 1 | 5.42 | 130 | 7 |
7 | 1 | 5.75 | 150 | 9 |
This is great. Now that we have our data in the format that we need, we can start focusing on the Naive Bayes equation and take our experiment forward.
We have our data data
Here,
With continuous data, we assume that the features for each class are distributed according to a Normal/Gaussian distribution and class probability can be calculated using Gaussian PDF function below
Where
Segment (subset) the dataset by class (male/female) and calculate the mean and var of features for male and female classes.
Before we can compute for the probability distribution for features
# Your code here
# Example output
# Mean values for male features
# height 5.855
# weight 176.250
# foot size 11.250
# dtype: float64
# Variance values for male features
# height 0.035033
# weight 122.916667
# foot size 0.916667
# dtype: float64
# Mean values for female features
# height 5.4175
# weight 132.5000
# foot size 7.5000
# dtype: float64
# Variance values for female features
# height 0.097225
# weight 558.333333
# foot size 1.666667
# dtype: float64
In order to build a functional classifier from the model above, we need some kind of a decision rule, (this applies to all classifiers). For our NB classifier, we use the
Now that we have the
Recall that we are going to plugin the likelihood computation into the Gaussian probability density function,
def likelihood(xi, mu, var):
pass
So this is our function for computing likelihood. We shall now compute the
There are two ways to do this:
- Give an equal probability for each
$k$ -classes i.e. a uniform prior - (number of class samples) / (total number of samples).
For this small dataset, we have equal number of classes , and both approaches will lead us to have . uniform prior anyway. we shall get a prior probabiity of 0.5 since there are exactly 4 samples for each class.
class_priors = None
class_priors
# array([0.5, 0.5])
Great, with our class_priors
array
We can now classify an un-labeled data.
# Add a new example with features of your choice (keep them reasonable)
# Person height weight foot size
# 0 0 6.00 180 12
# 1 0 5.92 190 11
# 2 0 5.58 170 12
# 3 0 5.92 165 10
# 4 1 5.00 100 6
# 5 1 5.50 150 8
# 6 1 5.42 130 7
# 7 1 5.75 150 9
# 8 -99 6.00 130 8 <-- new example
Using value -99 or similar values for unknowns in a dataset is a common way of processing data without generating Nan errors , yet keep the unknown identifiable by putting in some unlikely value i.e. -99 or -1 , which can later be searched for.
Let's calculate likelihood (probability of unknown class given new data) of each xi for the new example.
# height feature
x_1 = None
# weight feature
x_2 = None
# foot size feature
x_3 = None
x_1, x_2, x_3
# (array([1.57888318, 0.22345873]),
# array([5.98674302e-06, 1.67892979e-02]),
# array([0.00131122, 0.2866907 ]))
(None, None, None)
This completes our Gaussian likelihood
Now that we have all the likelihood values and our prior probabilities, the variables in our equation are now complete. Now we need to calculate the formula
In this particular example , it can be written simply as:
prediction = None
prediction
Recall that Gaussian
Concretely, the evidence
may be computed as follows in this case,
The
evidence
is the sum of all joint probability$p(C_{k}, x)$ .
evidence = None
posterior = None
posterior
# array([1.15230663e-05, 9.99988477e-01])
The posterior
probability values SHOULD now sum up to 1, i.e. a probability distribution. Let's check for it.
# Uncomment to check
#np.sum(posterior)
# 1.0
1.0
So now we have posterior class probabilities for each class, that sum up to 1. So naturally, which ever class shows a higher probability, will be chosen as the prediction.
# Predict the class using argmax
# The Naive Bayes predicts Class: 1
Recall that the index 1
refers to female
, hence our classifer predicts that a probable class for new example is female
. Neat isn't it. Pretty naive , yet highly effective.
- Read the dataset
diabetes.csv
into your code and modify the code it to perform predictions on presence or absence of diabetes using a number of available features and a target variable.
You may need to covert some of the code into functions to help you process your data faster as this now has 8 features.
In this lab, we looked at building a Naive Bayes classifier from scratch. This was not a complete machine learning experiment as we rather focused on the seeing how the algorithm performs in relation to underlying mathematics. NExt we shall look at how to achieve this functionality in SciKitLearn.