This project first conducts Exploratory Data Analysis (EDA) and data visualization on the diabetes dataset and then predict the disbetes using machine learning.
Diabetes data can be downloaded from
http://biostat.mc.vanderbilt.edu/wiki/Main/DataSets?CGISESSID=10713f6d891653ddcbb7ddbdd9cffb79
- Descriptive statistics
attribute type, class distribution, mean, stadard deviation, median, quartile, Skewness, correlation
- Data visualization
Histogram plot
Density plot
Box and Whisker plot
Bar plot
Missing data map
Pair-wise correlation plot
We compare the performance for the following classifiers:
-
Logistic Regression
-
Support Vector Machine (SVM)
-
random Forest