healthcare-analysis data-cleaning statistical-analysis

Capstone-Projects

Statistical analysis of heart disease data.

The dataset used in this project comes from four different sources.

Cleveland Clinic Foundation
Hungarian Institute of Cardiology, Budapest
V.A. Medical Center, Long Beach, CA
University Hospital, Zurich, Switzerland

The raw dataset contains 76 attributes, however all published experiments refer to using a set of 14 chosen from the larger collection. Thise 14 data indicators are age, sex, chest pain type, resting blood pressure, cholesterol level, fasting blood sugar, resting electrocardiographic results maximum heart rate achieved, exercize induced angina(true/false), ST depression induced by exercize related to rest, the slope of the peak exercize ST segment, number of major vessels colored by fluorosopy, thallium test result, heart disease risk value. The attributes were gathered to study and try to predict the presence of heart disease in a patient. The risk value refers to the presence of heart disease in the patient. It is integer valued from 0 (no presence) to 4. This report does not investigate any further than how such risk values were attributed.

It is unclear however, if the patients were admitted into the hospital's care because of suspected heart disease risk or if the patients come from a more general selection pool. Thusly, these results may not reflect a wide percentage of the population at hand. It is also recognizd that this dataset was curated no later then July 1988, as such this dataset would not reflect the results of present day trends of health and culture with respect to heart disease.

With that said, this report addresses three seperate questions relating to the nature and statistical relationship between the distinct locations of the data as well as some of the attributes therein. The questions are as follows:

Question 1 - Does there seem to be an average prediction risk value of heart disease shared among the data, or does one location stand out from the rest?

Question 2 - Does the data reflect an increased risk for heart disease in older patients?

Question 3 - How probable is a patient in the dataset to have a higher than average risk of heart disease if their cholesterol level is above 200 mg/dL?

About

Statistical analysis of heart disease data project completed during my enrollment in the Data Science program through Thinkful.

healthcare-analysis data-cleaning statistical-analysis

Languages

Language:Jupyter Notebook 100.0%