Handling Imbalanced-Data-Problem using Census Data

The data used in this example is imbalanced, fairly large and high dimensional. The basic purpose of this example is to show how to handle Imbalanced datasets. This is a fairly simple approach (one of the many).

In this project, following tasks are performed :

Data Exploration
Data Cleaning
Feature Engineering

Techniques used -

Oversampling
Undersampling
SMOTE

ML algos :

Naives Bayes
XGBoost

Download dataset : http://archive.ics.uci.edu/ml/machine-learning-databases/census-income-mld/

About

This is a simple Imbalanced dataset handling problem where I have used Census Data

imbalanced-data smote xgboost

Languages

Language:R 100.0%