LinearRegressionMath

LinearRegression

Linear RegressionMath From Scratch

An implementation of Linear Regression from scratch in python 📈. Includes the hypothesis function, partial differentiation of variables, parameter update rule and gradient descent.

Getting Started

Download the project on your computer. git clone https://github.com/gabrielferrazduque/LinearRegressionmath or download the ZIP file
Go to the directory with the file: cd LinearRegressionmath
Download the required packages: pip install -r requirements.txt
Run the project: python3 LinearRegressionmath.py

How It Works

What Is Linear Regression?

Linear regression is a method for approximating a linear relationship between two variables. While that may sound complicated, all it really means is that it takes some input variable, like the age of a house, and finds out how it's related to another variable, for example, the price it sells at.

We use it when the data has a linear relationship, which means that when you plot the points on a graph, the data lies approximately in the shape of a straight line.

Linear Regression involves a couple of steps:

Randomly initialising parameters for the hypothesis function
Computing the mean squared error
Calculating the partial derivatives
Updating the parameters based on the derivatives and the learning rate
Repeating from 2 until the error is minimised

The Hypothesis Function

The linear equation is the standard form that represents a straight line on a graph, where m represents the gradient, and b represents the y-intercept.

The Hypothesis Function is the exact same function in the notation of Linear Regression.

$h_{\theta}(x) = \theta_{1}x + \theta_{0}$

The two variables we can change – m and b – are represented as parameters theta1 and theta0

The Error Function

Clearly the line drawn in the graph above is wrong. But how wrong is it? That's what the error function is for - it calculates the total error of your graph. We'll be using an error function called the Mean Squared Error function, or MSE, represented by the letter J.

$J(\theta_{0}, \theta_{1}) = \frac{1}{2}\sum_{i=1}^{m}(h_{\theta}(x_{i}) - y_{i})^{2}$

Now while that may look complicated, what it's doing is actually quite simple. The function J takes our parameters theta0, and theta1 as an input. Then, from every point from 1 to m, where m is the number of points, it calculates the distance between the x value and the value predicted by our function and squares it. Squaring it ensures that the error is always positive, and that the error is greater the further away the predicted value is from the actual value.

Now that we have a value for how wrong our function is, we need to adjust the function to reduce this error.

Calculating Derivatives

If we graph our parameters against the error (i.e graphing the cost function), we'll find that it forms something similar to the graph below. Our goal is to find the lowest point of that graph, where the error is at its lowest. This is called minimising the cost function.

To do this, we need to consider what happens at the bottom of the graph - the gradient is zero. So to minimize the cost function, we need to get the gradient to zero.

The gradient is given by the derivative of the function, and the partial derivatives of the functions are

$\frac{\delta }{\delta \theta_{0}} = h_{\theta}(x_{i})- y_{i}$

$\frac{\delta }{\delta \theta_{0}} = (h_{\theta}(x_{i})- y_{i})x_{i}$

Updating The Parameters Based On The Learning Rate

Now we need to update our parameters to reduce the gradient. To do this, we use the gradient update rule

$\theta_{0} = \theta_{0} - \alpha\frac{\delta}{\delta \theta_{0}}$

$\theta_{1} = \theta_{1} - \alpha\frac{\delta}{\delta \theta_{1}}$

Alpha is what we call the Learning rate, which is a small number that allows the parameters to be updated by a small amount. As mentioned above, we are trying to update the gradient such that it's closer to zero (the bottom). The Learning rate helps guide the network to the lowest point on the curve by small amounts.

Minimising the Cost Function

Now we repeat these steps - checking the error, calculating the derivatives, and updating the weights, until the error is as low as possible. This is called minimising the cost funtion.

Once your error is minimised, your line should now be the best possible fit to approximate the data!

gabrielferrazduque / LinearRegressionMath