The data used in this example is imbalanced, fairly large and high dimensional. The basic purpose of this example is to show how to handle Imbalanced datasets. This is a fairly simple approach (one of the many).
In this project, following tasks are performed :
- Data Exploration
- Data Cleaning
- Feature Engineering
Techniques used -
- Oversampling
- Undersampling
- SMOTE
ML algos :
- Naives Bayes
- XGBoost
Download dataset : http://archive.ics.uci.edu/ml/machine-learning-databases/census-income-mld/