seanmcrae / House-Prices-Advanced-Regression-Techniques

COMPREHENSIVE DATA EXPLORATION WITH PYTHON

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

House-Prices-Advanced-Regression-Techniques

COMPREHENSIVE DATA EXPLORATION WITH PYTHON

This quote belongs to Thales of Miletus. Thales was a Greek/Phonecian philosopher, mathematician and astronomer, which is recognised as the first individual in Western civilisation known to have entertained and engaged in scientific thought (source: https://en.wikipedia.org/wiki/Thales)

I wouldn't say that knowing your data is the most difficult thing in data science, but it is time-consuming. Therefore, it's easy to overlook this initial step and jump too soon into the water.

So I tried to learn how to swim before jumping into the water. Based on Hair et al. (2013), chapter 'Examining your data', I did my best to follow a comprehensive, but not exhaustive, analysis of the data. I'm far from reporting a rigorous study in this kernel, but I hope that it can be useful for the community, so I'm sharing how I applied some of those data analysis principles to this problem.

Despite the strange names I gave to the chapters, what we are doing in this kernel is something like:

Understand the problem. We'll look at each variable and do a philosophical analysis about their meaning and importance for this problem. Univariable study. We'll just focus on the dependent variable ('SalePrice') and try to know a little bit more about it. Multivariate study. We'll try to understand how the dependent variable and independent variables relate. Basic cleaning. We'll clean the dataset and handle the missing data, outliers and categorical variables. Test assumptions. We'll check if our data meets the assumptions required by most multivariate techniques.

About

COMPREHENSIVE DATA EXPLORATION WITH PYTHON

License:Apache License 2.0


Languages

Language:Jupyter Notebook 100.0%