mihaelagrigore / Exploratory-Data-Analysis

Introduction to Exploratory Data Analysis (EDA). Made for beginners. Shows how to look at data to uncover patterns.

Home Page:https://www.kaggle.com/mishki/exploratory-data-analysis-pandas-numpy-seaborn

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Introduction to Exploratory Data Analysis

This notebook is a basic introduction to Exploratory Data Analysis (EDA), the foundation of any Data Science project. Because it's meant of beginners, as I myself was when writing it, I will ask some very basic questions to understand why they do things the way they do in this industry.

Although modelling is the most highlighted part of the job, experienced Data Scientists say that preparing the data before they can start training models for it takes most of their time. And when they speak about it, many times, you hear them saying "this is not pretty" or "not glamarous". I actually find this detective work pretty beautiful. I hope you will like it too.

Data preparation contains multiple steps. When I first started training myself in this field, I began reading about EDA in 7 Steps to Mastering Data Preparation for Machine Learning with Python on KDnuggets. In this tutorial I will only discuss Exploratory Data Analysis.

This notebook uses data from the World Happiness Report from 2020.

How to use it:
Open the Jupyter Notebook in this folder. You can clone it, download it or just read it here. There is also a link at the top of the Notebook which takes you to the same Notebook on Kaggle.

Contents of this Notebook

1. Why EDA ?
2. Pandas, Numpy, Matplotlib, Seaborn
3. Data types
4. Exploring categorical features
5. Exploring numerical features
6. Bivariate analysis
7. Outliers

About

Introduction to Exploratory Data Analysis (EDA). Made for beginners. Shows how to look at data to uncover patterns.

https://www.kaggle.com/mishki/exploratory-data-analysis-pandas-numpy-seaborn

License:MIT License


Languages

Language:Jupyter Notebook 100.0%