kumar-abhishek / Machine-Learning-for-Imbalanced-Data

Imbalanced Datasets in ML, published by Packt

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Machine Learning for Imbalanced Data

Machine Learning for Imbalanced Data

This is the code repository for Machine Learning for Imbalanced Data, published by Packt.

Tackle imbalanced datasets using machine learning and deep learning techniques

What is this book about?

As machine learning practitioners, we often encounter imbalanced datasets in which one class has considerably fewer instances than the other. Many machine learning algorithms assume an equilibrium between majority and minority classes, leading to suboptimal performance on imbalanced data. This comprehensive guide helps you address this class imbalance to significantly improve model performance.

This book covers the following exciting features:

  • Use imbalanced data in your machine learning models effectively
  • Explore the metrics used when classes are imbalanced
  • Understand how and when to apply various sampling methods such as over-sampling and under-sampling
  • Apply data-based, algorithm-based, and hybrid approaches to deal with class imbalance
  • Combine and choose from various options for data balancing while avoiding common pitfalls
  • Understand the concepts of model calibration and threshold adjustment in the context of dealing with imbalanced datasets

If you feel this book is for you, get your copy today!

https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders.

The code will look like the following:

from collections import Counter
X, y = make_data(sep=2)
print(y.value_counts())
sns.scatterplot(data=X, x="feature_1", y="feature_2")
plt.title('Separation: {}'.format(separation))
plt.show()

Following is what you need for this book: This book is for machine learning practitioners who want to effectively address the challenges of imbalanced datasets in their projects. Data scientists, machine learning engineers/scientists, research scientists/engineers, and data scientists/engineers will find this book helpful. Though complete beginners are welcome to read this book, some familiarity with core machine learning concepts will help readers maximize the benefits and insights gained from this comprehensive resource.

With the following software and hardware list you can run all code files present in the book (Chapter 1-10).

Software and Hardware List

Chapter Software required OS required
1-10 Google Colab Any OS

Related products

Get to Know the Author

Kumar Abhishek is a seasoned Senior Machine Learning Engineer at Expedia Group, US, specializing in risk analysis and fraud detection for Expedia brands. With over a decade of experience at companies such as Microsoft, Amazon, and a Bay Area startup, Kumar holds an MS in Computer Science from the University of Florida.

Dr. Mounir Abdelaziz is a deep learning researcher specializing in computer vision applications. He holds a Ph.D. in computer science and technology from Central South University, China. During his Ph.D. journey, he developed innovative algorithms to address practical computer vision challenges. He has also authored numerous research articles in the field of few-shot learning for image classification.

Links

Table of Contents and Code Notebooks

  1. Introduction to Data Imbalance in Machine Learning [open dir]
  2. Oversampling Methods [open dir]
  3. Undersampling Methods [open dir]
  4. Ensemble Methods [open dir]
  5. Cost-Sensitive Learning [open dir]
  6. Data Imbalance in Deep Learning [open dir]
  7. Data-Level Deep Learning Methods [open dir]
  8. Algorithm-Level Deep Learning Techniques [open dir]
  9. Hybrid Deep Learning Methods [open dir]
  10. Model Calibration [open dir]

About

Imbalanced Datasets in ML, published by Packt

License:MIT License


Languages

Language:Jupyter Notebook 100.0%