krishnanlab / MCC-F1-Curve-and-Metrics

MCC-F1 curve: a performance evaluation technique for binary classification

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MCC-F1 Curve and Metrics

MCC-F1 curve: a performance evaluation technique for binary classification

Based on the paper - The MCC-F1 curve: a performance evaluation technique for binary classification (Cao, Chicco, & Hoffman, 2020), wherein the authors combine two single-threshold metrics i.e. Matthews correlation coefficient (MCC) and the 𝐹1 score. into a MCC-F1 curve and also compute a metric that integrates the MCC-F1 curve inorder to compare classifier performance across varying thresholds.

The code computes the MCC-F1 curve and its relevant metrics.

  • Based on 2 input values - Ground truths and Predicted values (given by a binary classifer);
  • The MCC-F1 function calculates the MCC and F1 scores across varying thresholds.
  • The MCC-F1 metric provides a measure to compare classifers, and provides the the best threshold 𝑇 the point on the MCC-𝐹1 curve closest to the point of perfect performance (1,1)
  • Plotting the MCC-F1 curve.

The MCC-F1 function:

Based on the inputs of ground truths and predicted values; we can calculate Matthews correlation coefficient (MCC) and the 𝐹1 scores which are scoring classifiers. This results in a real-valued prediction score 𝑓(π‘₯𝑖) for each element, and then assigning positive predictions (𝑦𝑖̂ = 1) when the score exceeds some threshold 𝜏, or negative predictions (𝑦𝑖̂ = 0).

The MCC-F1 metric:

Based on the MCC-F1 scores calulated we can compute the MCC-F1 Metric based on the following steps:

  • Divide the normalized MCC in the curve [min𝑖 𝑋𝑖, max𝑖 𝑋𝑖] into π‘Š = 100 sub-ranges, each of width 𝑀 = (max𝑖 𝑋𝑖 βˆ’ min𝑖 𝑋𝑖)/π‘Š.
  • calculate the mean Euclidean distance between points with MCC in each sub-range to the point of perfect performance (1,1).
  • Calculate grand average i.e. averaged the mean distances amongst subranges.
  • Better classifiers have MCC-𝐹1 curves closer to the point of perfect performance (1,1), and have a larger MCC-𝐹1 metric.

About

MCC-F1 curve: a performance evaluation technique for binary classification

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:Jupyter Notebook 93.8%Language:Python 6.2%