I applied exploratory data analysis techniques on the Heart Disease Data set available from Kaggle. Techniques used include exploring the data issues, cleaning the data, getting to know the data better, finding correlations, and graphing the data.
Used Python libraries: Numpy, Pandas, MatPlotLib, and Seaborn
You can see the EDA Here: https://github.com/LisamShoe/Heart-Disease-Kaggle-Project/blob/master/Heart_Disease_EDA.ipynb
I ran multiple models to test out the accuracy of different models on this data set. I tried out techniques to modify the set including splitting into male and female, applying normalization, and reducing dimensions. I was able to achieve a 90% accuracy score in predicting heart disease in females and an 87% accuracy overall. I also created a decision tree model if the data needed to be utlizied in a low tech environment, which showed a 79% accuracy overall.
Used Python Libraries: Pandas, Scikit-learn, and MatPlotLib
Tested Machine Learning algorithms: Logistic Regression, Decision Tree, Random Forest, AdaBoost, and KNN
You can see the models here: https://github.com/LisamShoe/Heart-Disease-Kaggle-Project/blob/master/Modeling_And_Predicting_for_the_Heart_Disease_Data_Set.ipynb