NestorRV / undersampling

A Scala library for undersampling in imbalanced classification.

Home Page:https://nestorrv.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

undersampling

By Néstor Rodríguez Vico

Documentation available in https://nestorrv.github.io.

Included algorithms:

  • Balance Cascade. Original paper: "Exploratory Undersampling for Class-Imbalance Learning" by Xu-Ying Liu, Jianxin Wu and Zhi-Hua Zhou.

  • Class Purity Maximization algorithm. Original paper: "An Unsupervised Learning Approach to Resolving the Data Imbalanced Issue in Supervised Learning Problems in Functional Genomics" by Kihoon Yoon and Stephen Kwek.

  • ClusterOSS. Original paper: "ClusterOSS: a new undersampling method for imbalanced learning." by Victor H Barella, Eduardo P Costa and André C. P. L. F. Carvalho.

  • Condensed Nearest Neighbor decision rule. Original paper: "The Condensed Nearest Neighbor Rule" by P. Hart.

  • Easy Ensemble. Original paper: "Exploratory Undersampling for Class-Imbalance Learning" by Xu-Ying Liu, Jianxin Wu and Zhi-Hua Zhou.

  • Edited Nearest Neighbour rule. Original paper: "Asymptotic Properties of Nearest Neighbor Rules Using Edited Data" by Dennis L. Wilson.

  • Evolutionary Undersampling. Original paper: "Evolutionary Under-Sampling for Classification with Imbalanced Data Sets: Proposals and Taxonomy" by Salvador Garcia and Francisco Herrera.

  • Instance Hardness Threshold. Original paper: "An Empirical Study of Instance Hardness" by Michael R. Smith, Tony Martinez and Christophe Giraud-Carrier.

  • Iterative Instance Adjustment for Imbalanced Domains. Original paper: "Addressing imbalanced classification with instance generation techniques: IPADE-ID" by Victoria López, Isaac Triguero, Cristóbal J. Carmona, Salvador García and Francisco Herrera.

  • NearMiss. Original paper: "kNN Approach to Unbalanced Data Distribution: A Case Study involving Information Extraction" by Jianping Zhang and Inderjeet Mani.

  • Neighbourhood Cleaning Rule. Original paper: "Improving Identification of Difficult Small Classes by Balancing Class Distribution" by J. Laurikkala.

  • One-Side Selection. Original paper: "Addressing the Curse of Imbalanced Training Sets: One-Side Selection" by Miroslav Kubat and Stan Matwin.

  • Random Undersampling.

  • Tomek Link. Original paper: "Two Modifications of CNN" by Ivan Tomek.

  • Undersampling Based on Clustering. Original paper: "Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset" by Show-Jane Yen and Yue-Shi Lee.

About

A Scala library for undersampling in imbalanced classification.

https://nestorrv.github.io

License:GNU General Public License v3.0


Languages

Language:Scala 100.0%