Machine Learning for Imbalanced Data

This is the code repository for Machine Learning for Imbalanced Data, published by Packt.

Tackle imbalanced datasets using machine learning and deep learning techniques

What is this book about?

As machine learning practitioners, we often encounter imbalanced datasets in which one class has considerably fewer instances than the other. Many machine learning algorithms assume an equilibrium between majority and minority classes, leading to suboptimal performance on imbalanced data. This comprehensive guide helps you address this class imbalance to significantly improve model performance.

This book covers the following exciting features:

Use imbalanced data in your machine learning models effectively
Explore the metrics used when classes are imbalanced
Understand how and when to apply various sampling methods such as over-sampling and under-sampling
Apply data-based, algorithm-based, and hybrid approaches to deal with class imbalance
Combine and choose from various options for data balancing while avoiding common pitfalls
Understand the concepts of model calibration and threshold adjustment in the context of dealing with imbalanced datasets

If you feel this book is for you, get your copy today!

Instructions and Navigations

All of the code is organized into folders.

The code will look like the following:

from collections import Counter
X, y = make_data(sep=2)
print(y.value_counts())
sns.scatterplot(data=X, x="feature_1", y="feature_2")
plt.title('Separation: {}'.format(separation))
plt.show()

Following is what you need for this book: This book is for machine learning practitioners who want to effectively address the challenges of imbalanced datasets in their projects. Data scientists, machine learning engineers/scientists, research scientists/engineers, and data scientists/engineers will find this book helpful. Though complete beginners are welcome to read this book, some familiarity with core machine learning concepts will help readers maximize the benefits and insights gained from this comprehensive resource.

With the following software and hardware list you can run all code files present in the book (Chapter 1-10).

Software and Hardware List

Chapter	Software required	OS required
1-10	Google Colab	Any OS

Get to Know the Author

Kumar Abhishek is a seasoned Senior Machine Learning Engineer at Expedia Group, US, specializing in risk analysis and fraud detection for Expedia brands. With over a decade of experience at companies such as Microsoft, Amazon, and a Bay Area startup, Kumar holds an MS in Computer Science from the University of Florida.

Dr. Mounir Abdelaziz is a deep learning researcher specializing in computer vision applications. He holds a Ph.D. in computer science and technology from Central South University, China. During his Ph.D. journey, he developed innovative algorithms to address practical computer vision challenges. He has also authored numerous research articles in the field of few-shot learning for image classification.

Table of Contents and Code Notebooks

Introduction to Data Imbalance in Machine Learning [open dir]
Oversampling Methods [open dir]
Undersampling Methods [open dir]
Ensemble Methods [open dir]
Cost-Sensitive Learning [open dir]
Data Imbalance in Deep Learning [open dir]
Data-Level Deep Learning Methods [open dir]
Algorithm-Level Deep Learning Techniques [open dir]
Hybrid Deep Learning Methods [open dir]
Model Calibration [open dir]

kumar-abhishek / Machine-Learning-for-Imbalanced-Data

Machine Learning for Imbalanced Data

What is this book about?

Instructions and Navigations

Software and Hardware List

Related products

Get to Know the Author

Links

Table of Contents and Code Notebooks

About

Languages