kelseyaiello / eda

Exploratory Data Analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Exploratory Data Analysis

Exploratory data analysis (EDA) is the practice of investigating data to understand it's fundamental structure. This process often leverages visualization to quickly understand univariate distributions and relationships between variables. Initial EDA questions ask basic questions, including:

  • How large is the dataset (rows, columns)?
  • What are the variables present in the dataset?
  • What is the data type of each variable?

Following that basic information, it's common to dive deeper into particular variables, evaluating:

  • What is the distribution of the variable?
  • Is the variable ever missing (and if so, why)?
  • What are the basic summary statistics (mean, median, standard deviation) of my variable, and what is it's range (min/max)?

Throughout this initial process, one often develops more specific questions about each variable, or the dataset more generally. For example,

  • Is the distribution of my variable consistent across groupings?

Finally, relationships between variables may be assessed:

  • Is there a correlation between these two (or any two) variables?

Exploratory data analysis is a crucial step to understanding your data prior to any statistical analysis.

About

Exploratory Data Analysis

License:MIT License


Languages

Language:Jupyter Notebook 50.8%Language:HTML 49.1%Language:R 0.1%