Regression is used to find a model for continuous variables. The model can be used to handle prediction, controling and causal link task. Before doing regression, we need to determine which regression function is good for the sample data e.g. linear, polynomial or parabolic.
- Python
- Pycharm
In ML Lecture 1, I follow the tutorial and practice to use gradient descent to solve linear regression.
- Without using the linear regression closed form
- If using the pure gradient descent, it will take too much time to find the optimal parameter. Therefore, we need to give different learning rates to b and w, and update them in each iteration. The cool method is called Adagrad.
- Adagrad: I remeber my professor tell us that it can normalize the gradient at each iteration by gradient sd
Pure gradient descent | Gradient descent with Adagrad |
We have m data tuples which contain vector x and scalar y,
x is a n-dimensional vector,
Now, Assume y has linear relationship with x,
For simplifying the notation, we combine α and all β into a vector β, all y into vector y. We create a maxtrix X whose row vector ia vector x and added 1 at first column.
We assume the linear regression model is
To minimize error, the loss function is
Xβ can be viewd as a hyperspace in Rm spanned by the n+1 columns of X. X is fixed and β is variable. We want to find the best β so that the error will be minimized. Now, we imagine that if all errors are 0, ŷ is the projection of y onto the hyperplane spanned by X
Obviously, y-ŷ is orthogonal to the columns of X, we can utilize this property to get the best β instead of solving directly the loss function by partial differential,
I use as my linear regression model and solve β. It will get a plane in the space of R2.
- AI course of international management department in NTUST
- ML Lecture 1: Regression - Demo
- Hung-yi Lee ML Course
- Multiple Linear Regression