This project aims to find out the best possible way to predict Breast Cancer using Machine Learning Algorithms. The different kinds of models that will be used would include Tree Based Classifiers, Regression Based and Probabality Based Models. The best possible model will be decided based on 3 Metrics which are Mean-Accuracy, F1-Score and AUC-Score.
The datasets used for this project are the Wisconsin Diagnostic and Wisconsin Prognostic Datasets which are available on the UCI Machine Learning Repository. The data in the Dataset has been calculated by measuring the dimensions of the High-Resolution scans of the tissue.
- Data Preprocessing
- Data Visualization
- K-Fold Cross-Validation
- Model Fitting and Evaluation
- No Oversampling 1 (all columns)
- No oversampling 2 (dropping columns)
- Oversampling 1 (all columns)
- Oversampling 2 (dropping columns)