JinalShah2002 / House-Prices-Challenge-Solution

My solution to the House Prices Challenge on Kaggle.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Predicting Housing Prices

alt text

What is this repository?

As an upcoming ML engineer, I challenged myself to put my machine learning skills to the test. I challenged myself by tackling the Housing Prices Challenge on Kaggle. The goal of this challenge is to predict the prices of houses in Ames, Iowa based on a given set of features. To be exact, there are 79 features in total. This project allows the engineer (in this case myself) to practice critical Data Science & Machine Learning techniques.

This repository is organized via various self-explanatory folders.

The model is evaluated using the Root Mean Square Error, as this is the metric we are trying to minimize. My best model has a RMSE of 0.13757. This currently ranks in the top 43%. In reality, my solution would be much higher for various reasons:

  • Some solutions have an unfeasible RMSE of 0.0. No Machine Learning model can predict with such accuracy. I suspect cheating occured here.
  • Some solutions have a RMSE of 0.00044. After further inspection of such solutions, I found that these solutions are invalid because of the fact that competitors are simply providing the results of answers to a similar challenge (Boston Housing Prices). Once again, I believe this is cheating since no real Machine Learning methodologies are being deployed.

Final Model: My best model is a tuned CatBoost Model.

Note: you may use my solution as a reference; however, I would strongly advise you to tackle this challenge on your own. The only way you will get better at machine learning is to practice it on your own. I do not condone nor am I responsible for any cheating that may occur as a result of this repository.

Machine Learning Project Checklist:

This checklist is what I use for every ML project. This goes through every major step & ensures that I have done everything correctly.

  1. Framing the Problem - Complete
  2. Getting the Data - Complete
  3. Exploring the Data - Complete
  4. Data Preprocessing - Complete
  5. Model Development - Complete
  6. Model Tuning/Ensemble Learning - Complete
  7. Deploying Model on Test Set & Presentation of Solution - Complete

What tools are used in this project?

References

Future Adjustments

In reality, there are infinite adjustments I could make to improve my score; however, here a couple fruitful ones:

  • Combine the Tuned-CatBoost model with some other models (Linear Regression & Support Vector Machines seem promising)
  • Feature Engineering: I could maybe cut down the categories for certain features.
  • Feature Importance: Further feature selection. Use my model to make better selections for features.
  • Maybe incorporate outside data like many credible top-ranked solutions.

Closing Remarks

This project was very enjoyable ,and I definitely learned a lot along the way! I would recommend this challenge to anyone who is looking to dive into Machine Learning & Data Science. It is quite simple, and the dataset is relatively small & not overwhelming. Overall, this challenge was really fun and a great learning experience!

About the author

I am an undergraduate student @ Rutgers University New Brunswick, who is pursing bachelor degrees in Computer Science and Cognitive Science. Furthermore, I am pursing a certificate in Data Science. I have a passion for AI ,and I am always intriguied by its power. Feel free to contact me via Linkedln.
Enjoy!
Jinal Shah

About

My solution to the House Prices Challenge on Kaggle.


Languages

Language:Jupyter Notebook 99.7%Language:Python 0.3%