This is the code repository for Machine Learning for Imbalanced Data, published by Packt.
Tackle imbalanced datasets using machine learning and deep learning techniques
As machine learning practitioners, we often encounter imbalanced datasets in which one class has considerably fewer instances than the other. Many machine learning algorithms assume an equilibrium between majority and minority classes, leading to suboptimal performance on imbalanced data. This comprehensive guide helps you address this class imbalance to significantly improve model performance.
This book covers the following exciting features:
- Use imbalanced data in your machine learning models effectively
- Explore the metrics used when classes are imbalanced
- Understand how and when to apply various sampling methods such as over-sampling and under-sampling
- Apply data-based, algorithm-based, and hybrid approaches to deal with class imbalance
- Combine and choose from various options for data balancing while avoiding common pitfalls
- Understand the concepts of model calibration and threshold adjustment in the context of dealing with imbalanced datasets
If you feel this book is for you, get your copy today!
All of the code is organized into folders.
The code will look like the following:
from collections import Counter
X, y = make_data(sep=2)
print(y.value_counts())
sns.scatterplot(data=X, x="feature_1", y="feature_2")
plt.title('Separation: {}'.format(separation))
plt.show()
Following is what you need for this book: This book is for machine learning practitioners who want to effectively address the challenges of imbalanced datasets in their projects. Data scientists, machine learning engineers/scientists, research scientists/engineers, and data scientists/engineers will find this book helpful. Though complete beginners are welcome to read this book, some familiarity with core machine learning concepts will help readers maximize the benefits and insights gained from this comprehensive resource.
With the following software and hardware list you can run all code files present in the book (Chapter 1-10).
Chapter | Software required | OS required |
---|---|---|
1-10 | Google Colab | Any OS |
Kumar Abhishek is a seasoned Senior Machine Learning Engineer at Expedia Group, US, specializing in risk analysis and fraud detection for Expedia brands. With over a decade of experience at companies such as Microsoft, Amazon, and a Bay Area startup, Kumar holds an MS in Computer Science from the University of Florida.
Dr. Mounir Abdelaziz is a deep learning researcher specializing in computer vision applications. He holds a Ph.D. in computer science and technology from Central South University, China. During his Ph.D. journey, he developed innovative algorithms to address practical computer vision challenges. He has also authored numerous research articles in the field of few-shot learning for image classification.
- Introduction to Data Imbalance in Machine Learning [open dir]
- Oversampling Methods [open dir]
- Undersampling Methods [open dir]
- Ensemble Methods [open dir]
- Cost-Sensitive Learning [open dir]
- Data Imbalance in Deep Learning [open dir]
- Data-Level Deep Learning Methods [open dir]
- Algorithm-Level Deep Learning Techniques [open dir]
- Hybrid Deep Learning Methods [open dir]
- Model Calibration [open dir]