A small question about the comparison experiment

Question

A small question about the comparison experiment

StevenTian97 opened this issue 3 years ago · comments

Hello! Dear authors: I'm very interested in your paper. But I have some doubts in Section 4.1.1, "Note that any OOD samples are not available at training time, so we do not consider advanced calibration techniques for all the methods; for example, temperature scaling, input perturbation [2, 15], andregression-based feature ensemble [14]."
I have read ODIN, but I didn't see discription that OOD samples would be used when training. I really want to know the reason. Could you help me to figure it out? Thank you so much ^_^

Dongha Lee · Answer 1 · Mon Apr 26 2021 13:40:02 GMT+0800 (China Standard Time)

Hello! Precisely speaking, such calibration methods utilize OOD samples for validation, not for training. For example, in the ODIN paper [15], the authors described that "For each out-of-distribution dataset, we randomly hold out 1000 images for tuning the parameters T and epsilon. [...] The optimal parameters are chosen to minimize the FPR at TPR 95% on the holdout set". (Please refer to the paragraph “Choosing parameters”.) In other words, temperature scaling and input perturbation require several key parameters that should be determined based on the holdout set of OOD samples. In particular, their evaluation was unfair in that they use the same distribution for validation and test. From this perspective, our paper pointed out that using OOD samples for validation also does not make sense, because this eventually assumes a specific test distribution that cannot be known before we encounter them.

StevenTian97 · Answer 2 · Mon Apr 26 2021 16:09:17 GMT+0800 (China Standard Time)

THANK YOU for your patience on solving my problem! It does help me a lot. I truly appreciate your hard work. Hope to have chance to get further communication with you.