c-susan / datasci_3_eda

HHA507 / Data Science / Assignment 3 / Exploratory Data Analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

datasci_3_eda

HHA507 / Data Science / Assignment 3 / Exploratory Data Analysis

This repo contains my submission for HHA507 Assignment 3. This assignment focuses on Exploratory Data Analysis (EDA) using tools and techiques from Python.

This repo contains the following:

  • A folder named automaticEDA that contains the output from the automated analysis in a .html file.
  • A folder named datasets that contains the dataset file used in the assignment.
  • README.md: provides an overview of the repo.
  • hha507assignment3.ipynb: This Jupyter Notebook contains the code and documentation for the assignment.

Assignment Details

  1. Univariate Analysis: performed an univariate analysis on each numerical variable from the datset. Includes measures of central tendency (mean, median, mode) and measures of spread (range, variance, standard deviation, IQR). Visualizations are performed to view the distribution of each variable using histograms. Several of the variables are transformed using log to normalize the distribution.
  2. Bivariate Analysis: Created a scatterplot to view the relationship between two numerical variables. Also created a heatmap using Spearman rank correlation to view the correlation between two variables in the dataset.
  3. Handling Outliers: Identified outliers from the dataset and transformed them using log to normalize the data and visualization.
  4. Automated Analysis: Performed automated analysis and saved the output in an .html file. The file is located in the automaticEDA folder.

About

HHA507 / Data Science / Assignment 3 / Exploratory Data Analysis


Languages

Language:HTML 87.1%Language:Jupyter Notebook 12.9%