The increase in the rate of population growth causes the need for housing to increase because it is a primary need for everyone. Currently, many companies are engaged in buying and selling land, buildings, houses, etc. Someone who wants to buy a house certainly doesn't want to buy a house at the wrong price. Real estate companies when buying houses for resale also don't want to get the wrong purchase price because it can be detrimental to the company. To set a good purchase price, an analysis of the factors that influence property prices is required. Price predictions based on location and residence specifications are also very useful for setting purchase prices. Therefore, this project created a machine learning prediction model using linear regression to help a person or company buy property at the right price.
The goal of this project are:
- Extract, explore, analyze, and visualize data
- Make a prediction of house prices using a linear regression model
- Imputing missing values and encoding categorical features
- Improve model performance by reducing overfitting
- Evaluating the prediction model using the Mean Absolute Error (MAE)
This project consists of three parts:
- Prepare Data
- Import
- Explore
- Analysis
- Split
- Build the Model
- Baseline
- Iteration
- Evaluation
- Communication Results
- Testing
The dataset used has 34857 observations and 21 explanatory variables related to real estate prices in Melbourne Australia. Among the explanatory variables, there are 8 variables of type object, 12 float variables, and 1 int variable. A brief explanation of each variable is as follows:
- Suburb = Suburb
- Address = Addres
- Rooms = Number of rooms
- Type = Type br - bedroom(s); h - house, cottage, villa, semi, terrace; u - unit, duplex; t - townhouses; dev site - development site; o res - other residential.
- Price = Price in Australian dollars
- Method = Method: S - property sold; SP - property sold prior; PI - property passed in; PN - sold prior not disclosed; SN - sold not disclosed; NB - no bids; VB - bid vendors; W - withdrawn prior to auction; SA - sold after auction; SS - sold after auction price not ...
- SellerG = Real Estate Agent
- Date = Date sold
- Distance = Distance from CBD in Kilometers
- Postcodes = Postcodes
- Bedroom2 = Scraped # of Bedrooms (from different source)
- Bathroom = Number of Bathrooms
- Car = Number of carspots
- Landsize = Land Size in Meters
- BuildingArea = Building Size in Metres
- YearBuilt = Year the house was built
- Council Area = Governing council for the area
- Latitude = Latitude
- Longitude = Longitude
- Regionname = General Region (West, North West, North, North east …etc)
- Propertycount = Number of properties that exist in the suburbs.
https://knowledgeburrow.com/what-is-high-cardinality-vs-low-cardinality/
https://statisticsbyjim.com/regression/multicollinearity-in-regression-analysis/
https://vitalflux.com/correlation-heatmap-with-seaborn-pandas/
https://machinelearningmastery.com/data-leakage-machine-learning/
https://dataschool.com/fundamentals-of-analysis/what-is-an-outlier/
https://www.kaggle.com/code/phoonyein/melbourne-houses-price-analysis-prediction