There are 3 repositories under imbalance-classification topic.
ICDE'20 | A general & effective ensemble framework for imbalanced classification. | 泛用,高效,鲁棒的类别不平衡学习框架
NeurIPS’20 | Build powerful ensemble class-imbalanced learning models via meta-knowledge-powered resampler. | 设计元知识驱动的采样器解决类别不平衡问题
This is the code for Addressing Class Imbalance in Federated Learning (AAAI-2021).
Papers about long-tailed tasks
ResLT: Residual Learning for Long-tailed Recognition (TPAMI 2022)
A general, feasible, and extensible framework for classification tasks.
Some trick for handling imbalanced dataset
A Bonferroni Mean Based Fuzzy K-Nearest Centroid Neighbor (BM-FKNCN), BM-FKNN, FKNCN, FKNN, KNN Classifier
Credit card fraud is a burden for organizations across the globe. Specifically, $24.26 billion were lost due to credit card fraud worldwide in 2018, according to shiftprocessing.com. In this project, our goal was to build an effective and efficient model to predict fraud. We analyzed a real-world dataset that contained a list of government related credit card transactions over the 2010 calendar year. The data presented a supervised problem as it included a column showing the transaction’s fraud label (whether a transaction was fraudulent or not). It also contained identifying information about each transaction such as the credit card number, merchant, merchant state, etc. The dataset had 96,753 records and 10 data fields. We first described and visualized each of the 10 data fields, cleaned the dataset, and filled in missing values. Then we created many variables and performed feature selection. Finally, we created a variety of machine learning models (both linear and nonlinear) and highlighted our results.
Identify and classify toxic commentary
The Mulan Framework with Multi-Label Resampling Algorithms
Developed a NLP classification model that can classify negative reviews of restaurants, help restaurant managers save time on reviewing comments, absorbing information. Analyze the service defects, help restaurants improve business
Trying to solve a imbalanced little data in text sentiment analysis
This is a classification problem to detect or classify the fraud with label 0 or 1. Class with label 1 means fraud is detected otherwise 0. The biggest challenge is to handle the imbalanced data set.
In class Kaggle competition on predicting bankruptcy of a firm
Machine Learning analysis for an imbalanced dataset. Developed as final project for the course "Machine Learning and Intelligent Systems" at Eurecom, Sophia Antipolis
Algorithms used to confirm whether a celestial body is a planet or not.
This project aims to predict credit risk using various ensemble machine learning techniques. I have also tried to handle imbalance by using various sampling methods.
In this repository, we implement Targeted Meta-Learning (or Targeted Data-driven Regularization) architecture for training machine learning models with biased data.
In this repository, we implement Targeted Meta-Learning (or Targeted Data-driven Regularization) architecture for training machine learning models with biased data.
Contained in this repository are the Jupyter notebooks that contain the scripts used in this project. Examples include: exploratory data analysis, creation of training, validation and test data sets, and CNN model development and data extraction.
AmExpert 2019 - Machine Learning Hackathon
Built a model using XGBoost that predicts the chances of Attrition of an employee working at IBM with 84% Precision.
The following project aims at detecting the fraudulent credit card transactions while applying the various ML concepts right from Data Preparation, Feature Extraction, Model Validation, Hyper-param Tuning to Evaluation.
Déploiement d'une API Flask du modèle de classification déployée sur Heroku (OpenClassrooms | Data Scientist | Projet 7)
Introductory code snippets which deals with the basics of data science and machine learning which you can rely on anytime
This project is about detecting fraudulent credit card transactions. The dataset tends to be highly imbalanced, with less than 0.2% of the observations labelled as fraudulent. To address this issue we have to take into account the bank's objective (maximizing precision or recall) and restrictions. The performance and efficiency of many classification algorithms (Logistic Regression, XGBoost, Random Forests) were tested and compared.
Develop a neural network model which classify cars, trucks and cats, while dealing with imbalanced dataset. In addition, generate an adversarial image that designed to deceive the trained model.
Anomaly detection using unsupervised, semi-supervised, and supervised machine learning methods
This was a comprehensive project completed as part of the Data Science PG Programme. This covers classification algorithms over a dataset collected on health/diagnostic variables to predict of a person has diabetes or not based on the data points. Apart from extensive EDA to understand the distribution and other aspects of the data. Pre-processing was done to identify data which was missing or did not make sense within certain columns and imputation techniques were deployed to treat missing values. For classification the balance of classes was also reviewed and treated using SMOTE. Finally models were built and compared for accuracy on various metrics.Lastly the project contains a dashboard on the original data using Tableau
Classification Ml problem. The goal of this project is to build a model that borrowers can use to help make the best financial decisions.(Customer will experience financial delincy in the next two years))
This notebook shows how the f1 metric differs accuracy on imbalanced data. The heart disease dataset from kaggle is used (https://www.kaggle.com/datasets/kamilpytlak/personal-key-indicators-of-heart-disease).
Using the Kaggle dataset of credit card fraud detection, I have applied the techniques of both undersampling (with Autoencoders) and oversampling (SMOTE) to predict the credit card default.
This repo is about Machine Learning and Classification