[ASK] MIND test dataset doesn't work for run_eval
ubergonmx opened this issue · comments
Description
The following code:
label = [0 for i in impr.split()]
It is essentially making each news ID in the impression list non-clicked.
Instead of modifying the code, I modified the test behaviors file and added -0
to each news ID in the impression list (e.g., N712-0 N231-0
).
Now I get the following error after running run_eval
:
model.run_eval(test_news_file, test_behaviors_file)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File <timed exec>:1
File ~/.conda/envs/recommenders/lib/python3.9/site-packages/recommenders/models/newsrec/models/base_model.py:335, in BaseModel.run_eval(self, news_filename, behaviors_file)
331 else:
332 _, group_labels, group_preds = self.run_slow_eval(
333 news_filename, behaviors_file
334 )
--> 335 res = cal_metric(group_labels, group_preds, self.hparams.metrics)
336 return res
File ~/.conda/envs/recommenders/lib/python3.9/site-packages/recommenders/models/deeprec/deeprec_utils.py:594, in cal_metric(labels, preds, metrics)
591 res["hit@{0}".format(k)] = round(hit_temp, 4)
592 elif metric == "group_auc":
593 group_auc = np.mean(
--> 594 [
595 roc_auc_score(each_labels, each_preds)
596 for each_labels, each_preds in zip(labels, preds)
597 ]
598 )
599 res["group_auc"] = round(group_auc, 4)
600 else:
File ~/.conda/envs/recommenders/lib/python3.9/site-packages/recommenders/models/deeprec/deeprec_utils.py:595, in <listcomp>(.0)
591 res["hit@{0}".format(k)] = round(hit_temp, 4)
592 elif metric == "group_auc":
593 group_auc = np.mean(
594 [
--> 595 roc_auc_score(each_labels, each_preds)
596 for each_labels, each_preds in zip(labels, preds)
597 ]
598 )
599 res["group_auc"] = round(group_auc, 4)
600 else:
File ~/.conda/envs/recommenders/lib/python3.9/site-packages/sklearn/metrics/_ranking.py:567, in roc_auc_score(y_true, y_score, average, sample_weight, max_fpr, multi_class, labels)
565 labels = np.unique(y_true)
566 y_true = label_binarize(y_true, classes=labels)[:, 0]
--> 567 return _average_binary_score(
568 partial(_binary_roc_auc_score, max_fpr=max_fpr),
569 y_true,
570 y_score,
571 average,
572 sample_weight=sample_weight,
573 )
574 else: # multilabel-indicator
575 return _average_binary_score(
576 partial(_binary_roc_auc_score, max_fpr=max_fpr),
577 y_true,
(...)
580 sample_weight=sample_weight,
581 )
File ~/.conda/envs/recommenders/lib/python3.9/site-packages/sklearn/metrics/_base.py:75, in _average_binary_score(binary_metric, y_true, y_score, average, sample_weight)
72 raise ValueError("{0} format is not supported".format(y_type))
74 if y_type == "binary":
---> 75 return binary_metric(y_true, y_score, sample_weight=sample_weight)
77 check_consistent_length(y_true, y_score, sample_weight)
78 y_true = check_array(y_true)
File ~/.conda/envs/recommenders/lib/python3.9/site-packages/sklearn/metrics/_ranking.py:337, in _binary_roc_auc_score(y_true, y_score, sample_weight, max_fpr)
335 """Binary roc auc score."""
336 if len(np.unique(y_true)) != 2:
--> 337 raise ValueError(
338 "Only one class present in y_true. ROC AUC score "
339 "is not defined in that case."
340 )
342 fpr, tpr, _ = roc_curve(y_true, y_score, sample_weight=sample_weight)
343 if max_fpr is None or max_fpr == 1:
ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.
Any fix or workaround to this? How do I get the scores?
Other Comments
Originally posted by @ubergonmx in #1673 (comment)
I am trying to train the NAML model with the valid + test set.
it seems that is an error with AUC because there is just one class. It's like all your labels are one class.
I would try to look into the data and make sure you have positive and negagive classes
it seems that is an error with AUC because there is just one class. It's like all your labels are one class.
I would try to look into the data and make sure you have positive and negagive classes
Thank you. I think this was also an issue before, but it was closed as there's no test set with labels.
Sounds good