Helzheng123 / datasci_3_eda

Engage in the critical phase of Exploratory Data Analysis (EDA) using the tools and techniques from Python to uncover patterns, spot anomalies, test hypotheses, and identify the main structures of your dataset.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

datasci_3_eda

This is an assignment for HHA 507

Objective: Engage in the critical phase of Exploratory Data Analysis (EDA) using the tools and techniques from Python to uncover patterns, spot anomalies, test hypotheses, and identify the main structures of your dataset.

Instructions:

  1. Univariate Analysis:
  • Load a dataset of your choice in your Colab notebook .ipynb or in a python script .py (you can use one from previous assignments or find a new one).
  • Manually perform a univariate analysis to understand the distribution of each variable. This includes calculating measures of central tendency (mean, median, mode) and measures of spread (range, variance, standard deviation, IQR).
  • Visualize the distribution of select numerical variables using histograms.
  1. Bivariate Analysis:
  • Analyze the relationship between pairs of variables.
    • Use scatter plots to explore potential relationships between two numerical variables.
    • For categorical and numerical variable pairs, use boxplots.
  • Compute correlation coefficients for numerical variables and document any strong correlations observed.
  1. Handling Outliers:
  • Identify outliers in your dataset using the IQR method or visualization tools.
  • Decide on an approach to handle these outliers (e.g., remove, replace, or retain) and justify your decision in a markdown cell.
  • If there are no outliers based on 1, 2, or 3 standard deviations (or z scores >= 1), please state that and support it with your code.
  1. Automated Analysis:

Please refer to datasets to view the dataset used for this repo. Please refer to the automatedEDA folder to view the automated EDA pandas profiling.

About

Engage in the critical phase of Exploratory Data Analysis (EDA) using the tools and techniques from Python to uncover patterns, spot anomalies, test hypotheses, and identify the main structures of your dataset.


Languages

Language:HTML 97.3%Language:Jupyter Notebook 2.7%