- You need to download ‘Stroke Prediction Dataset’ data using the library Scikit learn; ref is given below.
- Divide the data randomly in training and testing with a 7:3 ratio 100 times, perform the following tasks with training data and test the performance on testing data. Testing data should remain unseen for all steps.
- Apply one of the best-known imputation methods to handle the missing/infinite values and state the significance of the used method if required.
- Visualize the data in 3-D scatter plot and write the inferences, How the data look like.
- Make a boxplot for each feature and highlight the outlier, if any, then remove the outlier, again visualize the data in 3-D scatter plot to show the outlier effect and write the inferences.
- Normalized the data if required, and write a note for what, why and how you performed normalization.
- Balance the data if required; you may increase the sample using upsampling if needed.
- Perform at least three clustering methods with varying cluster sizes. Perform any three best-known methods to find out correct cluster numbers for each method; how you finalized this cluster number.
- Perform at least three supervised methods for classification, and report at least three performance metrics out of (accuracy, precision, Cohen's kappa, F1-score, MCC, sensitivity and specificity) with proper reason.