The goal of this project was to locate data relating to Life Expectancy across the United States over time and to analyze it using Data Science tools in order to come up with conclusions how life expectancy differs across different variables.
https://healthinequality.org/data/ - The Health Inequality Project - Set of multiple datasets
https://data.hrsa.gov/data/download - Health Resources & Services Administration
https://hifld-geoplatform.opendata.arcgis.com/ - Homeland Infrastructure Foundation-Level Data
The project was done almost entirely in Python and Jupyter Notebook using a various set of libraries including:
- Pandas
- MatPlotLib
- Seaborn
- NumPy
- SciPy
We decided to split up who would answer questions regarding different variables that could potentially affect life expectancy. These questions were :
- Are there differences in the national LE by gender?
- Are there differences in national LE by income?
- Which states have the highest/lowest LE?
- Is LE affected by the percent of a state's uninsured population, percent of African American population, and percent of Hispanic population
- Are there differences in state LE by access to quality health care?
Each person then took on each question by cleaning the data, analyzing the data, and reporting their findings
Used the Health Inequality Project dataset. Pulled into Jupyter for cleaning first. The columns were cut down to only just the neccessary ones for analysis.
The analysis involved creating individual dataframes for males and females, then show the national average Life Expectancy per year per gender. We then created a line graph to visualize the average differences.
We found that the Female Average LE s 85.54 years while the Male Average LE is 81.81 years.
Used the Health Inequality Project dataset. Was cleaned by cutting down columns, and adding quartiles for income percentiles for easier analysis.
We then visualized the quartiles using a line chart and a box plot.
We also wanted to compare how Females and Males differed for individual quarters. So we looked at Q1 and Q4 to see the difference.
Used the Health Inequality Project dataset. Was cleaned by cutting down columns, and then deciding to use the average of 4 quarters worth of LE as the measure of LE for each State.
Plotted the top 5 and bottom 5 states for Male and Female and showed the opposite gender as a comparison.
This showed that the state with the highest male LE is Montana at 83.08 years and the state with the highest female LE is Vermont at 86.18 years. This is consistent with our analysis earlier that females tend to live longer in general.
This showed us that the state with the lowest male and female LE is Nevada at 79.64 and 86.18 years respectively. Some other takeaways from this analysis were that there is an average difference of 3.3 years per state between genders. Also, top and bottom states differ between genders, if only slightly. There were a few common states as well.
Used the Health Inequality Project dataset. Was cleaned by cutting down columns, then found the weight average for the county population against each column of interest. With the state averages for the data of interest calculated, we then merged this dataframe with the state dataframe from Q3 to include the Life Expectancy’s by state.
By charting the percent uninsured by state, and comparing against Q3's LE bar chart, we can see there are several states, including Vermont and Montana, which rank among the states with the most amount of insured residents and the greatest LE. Conversely, states, such as Nevada, have the lowest LE and have a population which ranks among the most uninsured.
We created a scatter plot to see if there is a direct correlation between State Average LE and Percent Uninsured by State. Because the chart did not appear to show a clear relationship between the variables, we ran a pearson correlation. The pearson correlation revealed that the LE will decrease the higher the percent of a state’s uninsured population increases.
We then created scatter plots to see the relationship between State Average LE and Percent of a State’s African-American and Hispanic population.
As well, the chart did not seem to show a clear relationship, so we ran a pearson correlation for both. In doing so, the pearson correlation showed that the LE will decrease the higher the percent of a state’s African-American and Hispanic population increases.
Used the HIFLD Data for County and State Hospital data and the HRSA for Health Care Qualtity.
First looked at Hospitals vs. Medically Underserved Areas. Used the IMU score as a standard for quality. It's a scale from 0 – 100 to determine the value of performance on demographic and health care facilities in a given county.
You can see in the blue the number of hospitals, and in the orange the IMU score. We then chose two states as our sampels, Alabama and Minnesota. These have an average LE of about 77 ad 82 respectively, while Alabama has 133 Health Care Facilities and Minnesota has 104.
Despite having more Health Care facilities, you can see that 18% of Alabama's population lives within IMU >60% while only 3% of Minnesota's population lives within IMU > 60%. This could explain why Alabama's LE is lower.