darshil2848 / House-Price-Prediction

House Price Analysis and Sales Price Prediction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Surprise Housing House Prediction

A US-based housing company named Surprise Housing has decided to enter the Australian market. The company uses data analytics to purchase houses at a price below their actual values and flip them on at a higher price. For the same purpose, the company has collected a data set from the sale of houses in Australia. The data is provided in the CSV file below.

Table of Contents

Problem Statement

Business Understanding

The company is looking at prospective properties to buy to enter the market. You are required to build a regression model using regularisation in order to predict the actual value of the prospective properties and decide whether to invest in them or not.

Business Goal:

You are required to model the price of houses with the available independent variables. This model will then be used by the management to understand how exactly the prices vary with the variables. They can accordingly manipulate the strategy of the firm and concentrate on areas that will yield high returns. Further, the model will be a good way for management to understand the pricing dynamics of a new market.

Business Risk:

  • Predicting a higher sale price for a house would not attact the customers which would lead to a loss to the company
  • Predicting a lower sale price for a house would lead to reduces profit margin for the company

Requirement:

  • Which variable are significant in predicting the sale price of the house?
  • How well those variabe describe the sale price of the house?
  • Optimal value of lambda for ridge and lasso regression?

General Information

  • Steps for crreating a Regularized regression model :
  1. Data Visualization
    1. Perform EDA to understand various variables.
    2. Check the correlation between the variables.
  2. Data Preparation
    1. Create dummy variables for all the categorical features.
    2. Divide the data to train & Test.
    3. Perform Scaling.
    4. Divide data into dependent & Independent variables.
  3. Data Modelling & Evaluation
    1. Create Linear Regression model using RFE
    2. Create L1 and L2 Regularized Models using the output of the RFE
    3. Check the various assumptions.
    4. Check the Adjusted R-Square for both train & Test data.
    5. Report the final model.
- Data Set : train.csv

Conclusions

EDA on Continuous Variable:
  1. LotFrontage - As LotFrontage in increases from [20.7 to 196.2] the SalePrice increases
  2. LotArea - As LotArea in increases from [1086 to 65483] the SalePrice increases
  3. MasVnrArea - As MasVnrArea in increases from [0 to 1280] the SalePrice increases
  4. BsmtFinSF1 - As BsmtFinSF1 in increases from [0 to 2257] the SalePrice increases
  5. BsmtFinSF2 - As BsmtFinSF2 in increases thier is not much effect on SalePrice
  6. BsmtUnfSF - As BsmtUnfSF in increases from [1168 to 2336] the SalePrice increases
  7. TotalBsmtSF - As TotalBsmtSF in increases from [0 to 3666] the SalePrice increases
  8. 1stFlrSF - As 1stFlrSF in increases from [329.64 to 3384] the SalePrice increases
  9. 2ndFlrSF - As 2ndFlrSF in increases from [206.5 to 1858] the SalePrice increases
  10. GrLivArea - As GrLivArea in increases from [328 to 4480] the SalePrice increases
  11. GarageYrBlt - As GarageYrBlt in increases thier is not much effect on SalePrice
  12. GarageArea - As GarageArea in increases from [0 to 1276] the SalePrice increases
  13. WoodDeckSF - As WoodDeckSF in increases from [0 to 685] the SalePrice increases
  14. OpenPorchSF - As OpenPorchSF in increases thier is not much effect on SalePrice
  15. EnclosedPorch - As EnclosedPorch in increases thier is not much effect on SalePrice
  16. AgeBuilt - As AgeBuilt in increases from [0 to 81] the SalePrice decreases and from [81 to 136] the SalePrice slightly increases
  17. AgeRemod - As AgeRemod in increases from [0 to 60] the SalePrice decreases
EDA on Categorical Variable:
  1. MSSubClass - 20, 50, 75, 120 have higher SalePrice than other MSSubClass
  2. MSZoning - RL and FV have higher SalePrice than other MSZoning
  3. LotShape - IR2 and IR1 have slightly higher avarage SalesPrice than other LotShape
  4. LandContour - HLS has slightly higher avarage SalesPrice than other LandContour
  5. LotConfig - CulDSac has slightly higher avarage SalesPrice than other LotConfig
  6. Neighborhood - NoRidge, NridgHt, Timber and StoneBr have higher SalePrice than other Neighborhood
  7. Condiction1 - PosN and RRNn have slightly higher SalePrice than other Condiction1
  8. BldgType - 1Fam and TwnhsE have slightly higher SalePrice than other BldgType
  9. HouseStyle - 2Story, 1Story and 2.5Fin have higher SalePrice than other HouseStyle
  10. OverallQual - As the OverallQual increases the the SalePrice increases steeply
  11. OverallCond - As the OverallCond increases the the SalePrice increases
  12. RoofStyle - No effect of RoofStyle on the SalePrice
  13. Exterior1st - VinylSd, CemntBd and Stone have higher SalePrice than other Exterior1st
  14. Exterior2nd - VinylSd, CemntBd and ImStucc have higher SalePrice than other Exterior2nd
  15. MasVnrType - Stone and SBrkr have higher avarage SalesPrice than other MasVnrType
  16. ExterQual - Ex has significant higher SalePrice than other ExterQual
  17. ExterCond - No effect of ExterCond on the SalePrice
  18. Foundation - PConc has higher avarage SalesPrice than other Foundation
  19. BsmtQual - Ex and Gd have significant higher SalePrice than other BsmtQual
  20. BsmtCond - TA and Gd have higher SalePrice than other BsmtCond
  21. BsmtExposure - Gd has higher SalePrice than other BsmtExposure
  22. BsmtFinType1 - GL Q has higher SalePrice than other BsmtFinType1
  23. BsmtFinType2 - GL Q has higher SalePrice than other BsmtFinType2
  24. HeatingQC - As HeatingQC becomes poor the SalePrice decreases
  25. BsmtFullBath - No effect of BsmtFullBath on the SalePrice
  26. FullBath - 2 and 3 ave significant higher SalePrice than other FullBath
  27. HalfBath - 1 has slightly higher avarage SalePrice than other HalfBath
  28. BedroomAbvGr - 0 and 4 have slightly higher avarage SalePrice than other BedroomAbvGr
  29. KitchenQual - Ex has significant higher SalePrice than other KitchenQual
  30. TotRmsAbvGrd - As the TotRmsAbvGrd increases the the SalePrice increases
  31. FirePlaces - As the FirePlaces increases the the SalePrice increases
  32. FireplaceQu - Ex has significant higher SalePrice than other FireplaceQu
  33. GarageType - Attchd and BuiltIn have higher SalePrice than other GarageType
  34. GarageFinish - Fin has higher SalePrice than other GarageFinish
  35. GarageCars - As the GarageCars increases the the SalePrice increases except for 4
  36. GarageQual - Gd and Ex have higher SalePrice than other GarageQual
  37. Fence - No effect of Fence on the SalePrice
  38. MoSold - No effect of MoSold on the SalePrice
  39. YrSold - No effect of YrSold on the SalePrice
  40. SaleType - New, CWD and Con have higher SalePrice than other SaleType
  41. SaleCondition - Partial has significant higher SalePrice than other SaleCondition
Which variables are significant in predicting the price of a house?
  • 'OverallQual_10'
  • 'OverallQual_9'
  • 'Neighborhood_NoRidge'
  • 'FullBath_3'
  • 'TotRmsAbvGrd_11'
  • 'Fireplaces_3'
How well those variables describe the price of a house?
  • OverallQual_10 = 0.833928
  • OverallQual_9 = 0.830237
  • Neighborhood_NoRidge = 0.592742
  • FullBath_3 = 0.530678
  • TotRmsAbvGrd_11 = 0.505044
  • Fireplaces_3 = 0.442772
Optimal value of lambda for ridge and lasso regression?
  • Ridge : 5.0
  • Lasso : 0.001
Final Model :

Chossing the Lasso Regularized model as the final model to predict the SalePrice because of the following reasons

  • More Feature elimination which would lead to making the model simple, robust and generalized model
  • Similar performace when compared to Ridge Regularization

Technologies Used

  • pandas - 1.3.4
  • numpy - 1.20.3
  • matplotlib - 3.4.3
  • seaborn - 0.11.2
  • plotly - 5.8.0
  • sklearn - 1.1.2
  • statsmodel - 0.13.2

Acknowledgements

Contact

Created by [@darshil2848] - feel free to contact me!

About

House Price Analysis and Sales Price Prediction


Languages

Language:Jupyter Notebook 100.0%