IvyLinMS / data-512-final-project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

data-512-final-project

This is our final project of Data512 Human-Centered Data Science. Our goal is to practice how to complete the whole data science project, from gathering data, processing data, then setting up our researching questions (Null Hypothesis), proposing the proper methodology, then leading to the final conclusion and writing up a constructed research report. For this project, I used cov-19 related data around Daily Cases, Infection Rate and Vaccine Data to find the correlation between Infection Rate and Vaccine Rate. All analysis are performed in a single Jupiter notebook, named hcds-final-project.ipynb.

Data sources

Data processing

  • Data cleaning and drop unneeded columns
  • Melt the confirm case data so each row represent confirmed case each day
  • Standardize the FIPS column among the three datasets
  • Filter out only Palm Beach,FL data(FIPS == '12099')

Data Visualization

  • Estimated Prevalence of Mask Wearing in Palm Beach County, FL
  • Time series visualize Accumulate Covid Cases, Palm Beach Country, FL
  • Daily New Covid Cases & 7 days rolling average, Palm Beach Country, FL
  • 7 days rolling average infection rate(Daily new cases / Population), Palm Beach Country, FL
  • 7 days rolling average infection rate Diff, Palm Beach Country, FL

Data Analysis

  • As we can see, Palm Beach's Infection rate has no much change, specially using the 7 days rolling average visualization
  • Since Palm Beach county doesn't have mandate mask policy, try to find another county with similar mask wearing data but with have mark mandate policy
  • Using Mask-Wearing Survey Data to look for another city with the similar mask wearing %
  • Found a similar county(Spotsylvania County, VA) which has similar mask wearing %, specially "ALWAYS == 0.785"

Research Questions:

Based on the data visualization "7 days rolling average infection rate and Vaccine rate at Palm Beach, FL", we can see when the Cov-19 vaccine started at 2021-01, the infection rate went down until Delta variant came to the picture around 2021-07.

NULL Hypothesis: There is no correlation between Vaccine Rate and Daily Infection Rate of 7 days rolling average

Considering Delta Variant Impact:

Therefore, I splitted the data to 3 time period to find the correlation between Infections Rate and Vaccine Rate.

  • Before 2021-01 Vaccine started
  • After 2021-01 Before 2021-07 Vaccine rate increase and infection rate decrease
  • After 2021-07 Delta variant

Conclusion:

Using Pandas .corr() function and seaborn heatmap, we can see:

  • No correlation between Infection Rate and Vaccine Rate Before 2021-01 Vaccine started
  • Very Strong Negative correlation between Infection Rate and Vaccine Rate after 2021-01 Vaccine started and before 2021-07 Delta Variant, Vaccine helped a lot.
  • Very slight negative correlation between Infection Rate and Vacine Rate after 2021-07 Delta variant, Vaccine didn’t help a lot for Delta variant

From the statsmodels.api OLS Regression Results:

  • P_Value is 0.000 which < 0.05, so we can reject NULL Hypotheis: There is no correlation between Vaccine Rate and Daily Infection Rate of 7 days rolling average.
  • R-squared is 0.851, meaning the goodness of LinearRegression model fit.

Known Issues:

  • Palm Beach, FL never has Masking Mandated requirement
  • The result can be impacted by other facts: Summer break and Nationwide reopening, etc
  • Spotsylvania County, VA only has 1/10 of population of Palm Beach, FL

About

License:MIT License


Languages

Language:HTML 61.0%Language:Jupyter Notebook 39.0%