alvaromc317 / metrics_computation

Given a True beta vector and a predicted beta vector, computes a set of metrics such as true positive rate, recall, etc

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

metrics_computation

Do you usually deal with high dimensional regression datasets and you need to compute a bunch of metrics for the beta coefficients of your model?

Then metrics_computation is your package. We assume here that beta coefficients that are different than 0 are positive, while coefficients that are equal to zero are negative. This package can easily compute:

  • true_positive_rate: True positive rate, sensitivity or recall = true positive / real positive
  • false_negative_rate: False negative rate = false negative / real positive (observe thar TPR + FNR = 1)
  • true_negative_rate: True negative rate or specificity = true negative / real negative
  • false_positive_rate: False positive rate = false positive / real negative (TNR + FPR = 1)
  • precision: Precision = true positive / (true positive + false positive)
  • f_score: F-score = 2*(precision*recall)/(precision+recall)
  • correct_selection_rate: Correct selection rate = (true positive + true negative) / total number
  • count_number_positives: Count the number of positives in both the predicted and the real beta array
  • store_beta: Sparse storage as an index-value pair of both predicted and real beta array

Usage example

We first generate a synthetic dataset using the data_generation package available here.

import numpy as np
import data_generation as dgen

data_equal = dgen.EqualGroupSize(n_obs=5200, ro=0.2, error_distribution='student_t', 
                                 e_df=5, random_state=1, group_size=10, non_zero_groups=7, 
                                 non_zero_coef=5, num_groups=15)

x, y, beta, group_index = data_equal.data_generation().values()

This generates a synthetic dataset that includes 5000 observations and 15x10=150 variables, among which 7x5=35 are different than zero and 115 are zero. Once we have the dataset, we obtain a prediction using a penalized regression model from the asgl package.

import asgl
lambda1 = 10.0 ** np.arange(-3, 1.51, 0.1)
tvt_alasso = asgl.TVT(model='lm', penalization='alasso', lambda1=lambda1, parallel=True,
                      weight_technique='pca_pct', variability_pct=0.9, error_type='MSE', 
                      random_state=1, train_size=100, validate_size=100)
alasso_result = tvt_alasso.train_validate_test(x=x, y=y)

alasso_prediction_error = alasso_result['test_error']
alasso_betas = alasso_result['optimal_betas'][1:] # Remove intercept

Now we have the true_betas and the alasso_betas obtained using an adaptive lasso model. Lets compute the metrics available in metrics_computation and see how good is our model:

import metrics_computation as mc
metrics_object = mc.MetricsComputation(metrics=['true_positive_rate', 'false_negative_rate', 'true_negative_rate',
                                                'false_positive_rate', 'precision', 'f_score',
                                                'correct_selection_rate', 'beta_error', 'count_number_positives',
                                                'store_beta'])
metrics = metrics_object.fit(predicted_beta=alasso_betas, true_beta=beta)

And the results obtained are:

print(metrics)
{'true_positive_rate': 1.0,
 'false_negative_rate': 0.0,
 'true_negative_rate': 0.5043478260869565,
 'false_positive_rate': 0.4956521739130435,
 'precision': 0.3804347826086957,
 'f_score': 0.5511811023622047,
 'correct_selection_rate': 0.62,
 'beta_error': 3.3391816482267616,
 'count_number_positives': {'number_positives_predicted_beta': 92,
  'number_positives_true_beta': 35},
 'store_beta': {'predicted_beta': {'index_non_zero_predicted_beta': array([  0,   1,   2,   3,   4,   5,   7,  10,  11,  12,  13,  14,  17,
           20,  21,  22,  23,  24,  25,  30,  31,  32,  33,  34,  35,  36,
           37,  38,  39,  40,  41,  42,  43,  44,  45,  46,  47,  49,  50,
           51,  52,  53,  54,  56,  57,  59,  60,  61,  62,  63,  64,  65,
           66,  67,  69,  71,  76,  77,  78,  79,  80,  85,  86,  87,  88,
           90,  91,  95,  97, 102, 103, 107, 108, 112, 114, 115, 117, 119,
          124, 125, 126, 129, 130, 133, 135, 137, 139, 140, 142, 145, 147,
          148]),
   'value_non_zero_predicted_beta': array([ 0.79339995,  1.87522975,  3.13763985,  3.61334788,  4.51621746,
           0.31125623,  0.54875971,  0.87955032,  1.81445199,  2.31443026,
           4.10932758,  5.42481166,  0.41256546,  1.12469476,  1.58801805,
           2.94509453,  3.13194602,  5.03623722,  0.13976405,  1.01355276,
           2.17043622,  2.66209677,  4.31500118,  5.35810521, -0.04337581,
          -0.10444598,  0.14715516,  0.11823591,  0.39685933,  1.16270464,
           0.66003076,  2.67933492,  3.11439734,  4.99887818,  0.4943931 ,
           0.79173587, -0.15166737,  0.01333736,  1.3140838 ,  2.29604342,
           2.42913034,  4.28719575,  4.68948029, -0.19252128,  0.09013478,
           0.0471931 ,  0.39352183,  2.59684326,  3.2423719 ,  3.70960104,
           4.47649098,  0.26081206,  0.01503026, -0.01503932,  0.18010438,
           0.3148776 , -0.19312873,  0.15733734, -0.2337122 , -0.06246009,
           0.43419795,  0.38280452,  0.00820053, -0.01194388,  0.1596238 ,
          -0.00966318,  0.00836905,  0.07707487,  0.00829384, -0.6871737 ,
           0.00859946,  0.25547944,  0.03750622,  0.20261963, -0.07859221,
          -0.05090997, -0.03631807, -0.23408105, -0.24707326, -0.10432976,
           0.01815028, -0.38785341,  0.19690068, -0.53279602, -0.1147006 ,
           0.10160788,  0.08420481, -0.34810438,  0.39998714,  0.18704431,
           0.21155179, -0.34328259])},
  'true_beta': {'index_non_zero_true_beta': array([ 0,  1,  2,  3,  4, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31,
          32, 33, 34, 40, 41, 42, 43, 44, 50, 51, 52, 53, 54, 60, 61, 62, 63,
          64]),
   'value_non_zero_true_beta': array([1., 2., 3., 4., 5., 1., 2., 3., 4., 5., 1., 2., 3., 4., 5., 1., 2.,
          3., 4., 5., 1., 2., 3., 4., 5., 1., 2., 3., 4., 5., 1., 2., 3., 4.,
          5.])}}}

About

Given a True beta vector and a predicted beta vector, computes a set of metrics such as true positive rate, recall, etc

License:GNU General Public License v3.0


Languages

Language:Python 100.0%