This Project Has Been Confirmed As Successful By A Udacity Reviewer.
In this project, I applied basic machine learning concepts on data collected for housing prices in the Boston, Massachusetts area to predict the selling price of a new home. I first used the NumPy libary to analyze the data to obtain important features and descriptive statistics about the dataset. Next, I split the data into testing and training subsets, and determine a suitable performance metric for this problem. I analyzed performance graphs for a learning algorithm with varying parameters and training set sizes. Finally, I tested this model on a new sample and compare the predicted selling price to my statistics. The result was less than one standard deviation away from the mean.
From this project I was acquainted to working with datasets in Python and applying basic machine learning techniques using NumPy and Scikit-Learn.
Things I learned from this project:
- How to use NumPy to investigate the latent features of a dataset.
- How to analyze various learning performance plots for variance and bias.
- How to determine the best-guess model for predictions from unseen data.
- How to evaluate a model’s performance on unseen data using previous data.
- Model fitting, data train & test split, cross-validation, & parameter optimization with grid search.