google-research / robustness_metrics

I would like to extend the code to also evaluate the performance on images that are completely out-of-distribution. For that, the user would need to be able to specify both an in-distribution dataset (e.g., CIFAR-10) and an out-of-distribution dataset (e.g., SVHN).

Currently, the code is designed for pairs of one metric and one dataset (accuracy@imagenet, brier@imagenet). What is the best approach to extending it so that the user can specify two datasets for a metric, such as aucroc@cifar10&svhn. Did you already consider that scenario? Where would be a good point for me to start?

Hello! We have the concept of a report that should cover this scenario:

robustness_metrics/robustness_metrics/reports/base.py

Line 68 in b4c4d4d

class Report(metaclass=abc.ABCMeta):

You can specify that you need measurements on both datasets, and then combine them. Would this work?

Hi @josipd, thank you for the quick answer and the idea of combining datasets in a report. In the case of the AUC-ROC, I think that the AUC metric instance would need to be called with predictions from both the in-distribution and the OOD dataset through:

robustness_metrics/robustness_metrics/metrics/base.py

Line 48 in b4c4d4d

def add_predictions(self, model_predictions: types.ModelPredictions) -> None:

If I understand it correctly, the metric would otherwise not be able to return a single float value when its result method is called:

robustness_metrics/robustness_metrics/metrics/base.py

Line 57 in b4c4d4d

def result(self) -> Dict[Text, float]:

What would be a good approach for combining a metric with two datasets? Or do you think there is an alternative way?

I see! The easiest would be to create a union dataset that loads and concatenates the two wrapped datasets. For this you can use https://www.tensorflow.org/api_docs/python/tf/data/Dataset#concatenate

That makes sense. Thanks for the suggestion, I will do it this way!

How to integrate OOD metrics