MariemAshraf / handle_imabalnce_class

Address imbalance classes in machine learning projects.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Working with highly imbalanced datasets in machine learning projects.

Basic Information:

This project was part of one my recent job interview skill test for a ?Machine learning engineer? position. I had to complete the project in 48 hours which includes writing a 10-page report in latex. The dataset has classes and highly imbalanced. The primary objective of this project was to handle data imbalance issue. In the following subsections, I describe three techniques I used to overcome the data imbalance problem.

Datasets

Datasets: There are three labels [1, 2, 3] in the training data which makes the problem a multi-class problem. Training datasets have 17 features and 38829 individual data point. Whereas in testing data, there are 16 features without the label and have 16641 data points. The training dataset is very unbalanced. The majority of the data belongs to class-1 (95 percent) whereas class-2 and class-3 have 3.0 percent and 0.87 percent data respectively. Since the datasets do not have any null values and already scaled, I did not do any further processing. Due to some internal reasons, I am not going to share the datasets but the detail results and techniques. The following figure show data imbalance.

Codes and Libraies

I have Used python 3.0. The following Python libraries are also required:

  • Jupyterlab
  • NumPy
  • Pandas
  • matplotlib
  • scikit-learn
  • scikit-learn
  • seaborn

    Contributors

    Sabber Ahamed sabbers@gmail.com

    License

    MIT

  • About

    Address imbalance classes in machine learning projects.


    Languages

    Language:Jupyter Notebook 100.0%