Heart-Disease-Kaggle-Project

Part 1: EDA

I applied exploratory data analysis techniques on the Heart Disease Data set available from Kaggle. Techniques used include exploring the data issues, cleaning the data, getting to know the data better, finding correlations, and graphing the data.

Used Python libraries: Numpy, Pandas, MatPlotLib, and Seaborn

You can see the EDA Here: https://github.com/LisamShoe/Heart-Disease-Kaggle-Project/blob/master/Heart_Disease_EDA.ipynb

Part 2: Modeling

I ran multiple models to test out the accuracy of different models on this data set. I tried out techniques to modify the set including splitting into male and female, applying normalization, and reducing dimensions. I was able to achieve a 90% accuracy score in predicting heart disease in females and an 87% accuracy overall. I also created a decision tree model if the data needed to be utlizied in a low tech environment, which showed a 79% accuracy overall.

Used Python Libraries: Pandas, Scikit-learn, and MatPlotLib

Tested Machine Learning algorithms: Logistic Regression, Decision Tree, Random Forest, AdaBoost, and KNN

You can see the models here: https://github.com/LisamShoe/Heart-Disease-Kaggle-Project/blob/master/Modeling_And_Predicting_for_the_Heart_Disease_Data_Set.ipynb

About

As part of the LaunchCode Data Science Track, I performed EDA on the Heart Disease Data set and tested the accuracy of different machine learning models. Predicted female heart disease with 90% accuracy.

Languages

Language:Jupyter Notebook 100.0%