tissyamalik / mlr-boston-house-pricing

Predicting House Price on Boston dataset using Multiple Linear Regression model

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Regression model and analysis

In multiple regression, we predicting dependent variable from two or more independent variables e.g. predicting height from weight and age. Linear regression implies that the relationship between the dependent variable and independent variables is linear and thus can be described by a linear plane known as the regression plane. We are in the process of finding a regression plane that fits the maximum number of data points.

MLR MODEL

Objective: Predict the dependent variable Yi

Model structure: Yi=β0 + β1XI + β2X2 + ... βKXK + εi where β0 (y-intercept/constant) and βK (slope) are population regression coefficients and εi is the prediction error (prediction does not match with the actual)

Model parameters: sample regression coefficients b0, b1... bk

Model hyperparameters: Mean square error

Methods to estimate model parameters: Ordinary Least square (Minimise error function)

Model assumptions: Relationship between dependent and independent variables is linear, No correlation (a measure of the relationship between two variables derived from covariance) between errors & between error and independent variables, all errors have equal variances, errors follow a normal probability distribution, independent variables are not collinear (no multicollinearity)

USE CASE

The boston dataset is builtin in sklearn library. There are 13 independent features. The model is evaluated on MSE and R square metrics.

RESULTS

Duplicate values: None Missing value: None Multicollinearity: Yes. Regression coefficients: [ 3.85906801e+00 6.81446545e-03 -1.50026956e+00 -4.86738066e-01 -1.01775264e+00 -1.10716124e-02 -1.21310401e-01 2.51124642e+00 -1.62312529e+01 -9.98516565e-03 4.44664254e-02 1.13416945e-02 2.42143466e-01]

y-intercept: 37.937107741832705

prediction on test data (first 12 values) [24.9357079 23.75163164 29.32638296 11.97534566 21.37272478 19.19148525 20.5717479 21.21154015 19.04572003 20.35463238 5.44119126 16.93688709]

MSE is 27.19 R square is 0.52921.

The python code python code is available on GitHub

Hi I made this project during the 7 Days Bootcamp on Python and Machine Learning, conducted by SHAPEAI . The instructor during the session was Mr. Shaurya Sinha (Data Analyst Intern at Jio). I got to learn a lot during these 7 days and it was an amazing experience learning with SHAPEAI.

Here's the link for you to watch the sessions as well

I got to have hands on experience on:

  • Python
  • Machine Learning
  • About

    Predicting House Price on Boston dataset using Multiple Linear Regression model

    License:MIT License


    Languages

    Language:Jupyter Notebook 100.0%