The primary goal was to develop a predictive model to accurately forecast the house price per unit area based on features such as the property's age, proximity to key amenities like MRT stations, the number of convenience stores, and its geographical location.
We began by examining our dataset to identify missing values, understand the distribution of each variable, and explore the relationships between features and the target variable.
- Negative Correlation: The 'Distance to the nearest MRT station' showed a strong negative correlation with the house price per unit area.
- Positive Correlation: The 'Number of convenience stores' displayed a moderate positive correlation with house prices.
Features with strong correlations were prioritised for the model. Specifically, 'Distance to the nearest MRT station' was identified as a key predictive feature.
Attempted to capture more complex relationships through polynomial features up to the 2nd degree. However, this did not significantly impact the model's performance.
Two models were selected for comparison: Linear Regression and Random Forest Regressor. Both models were trained and evaluated using Mean Squared Error (MSE) and R-squared (R²) metrics.
- Linear Regression showed better performance compared to the Random Forest Regressor, with a lower MSE and higher R², indicating its effectiveness in capturing the relationship between the selected features and house prices.
- Random Forest Regressor did not show significant improvement even after the addition of polynomial features.
- The coefficient for 'Distance to the nearest MRT station' in the Linear Regression model suggests a decrease in house prices with increasing distance from MRT stations.
Linear Regression emerged as the more suitable model for this dataset. The addition of polynomial features did not significantly enhance the Random Forest model's performance.
- Further explore feature engineering, including higher-degree polynomials.
- Optimise the Random Forest model's hyperparameters.
- Evaluate other regression models or ensemble methods.
- Investigate additional features or external data sources to improve model predictions.
This analysis highlights the importance of feature selection, the impact of amenities' proximity on house prices, and the effectiveness of different modeling techniques for this prediction task.