Score function for APS

Question

Score function for APS

szalouk opened this issue a year ago · comments

Hello,

Thank you for providing these notebooks for conformal prediction, they have been immensely helpful.

Reading through the section 2.1 of the paper on "Classification with Adaptive Prediction Sets" and the associated notebook, I had some questions about the scoring function.

Namely, the paper provides the score function

$$s(x,y) = \sum_{j=1}^k \hat{f}(x)_{\pi_j(x)}$$

where $y = \pi_j(x)$. Why are we including $\hat{f}(x)_{y}$ in the sum? Doing so would lead to some possibly problematic scores. Consider, for example, a perfect predictor that assigns all of its mass to the correct label $y$, and a completely incorrect predictor that assigns all of its mass to some incorrect label $\ne y$. Both of these predictors would have the same score of 1. This breaks the assumption that a higher score corresponds to misalignment between the forecaster and the true label.

Investigating this issue further, I tried modifying the score function to greedily include all classes up to, but not including, the true label. Intuitively, a higher score would correspond to more probability mass assigned to incorrect labels, which is a better estimate of misalignment. Coding this up in the notebook for APS, this little fix increased the coverage slightly, but more importably it decreased the mean size of the confidence sets to 3.3 (compared to 187.5 in the original notebook). The confidence sets on the imagenet examples also seem to make more sense upon preliminary inspection. This could possibly address an issue raised previously.

Is there a typo/error in the score function of APS that would explain these results?

Thanks in advance!

Anastasios Angelopoulos · Answer 1 · Thu Sep 14 2023 18:01:03 GMT+0800 (China Standard Time)

The randomized version of the APS score has much better set size, as you may have seen earlier. Your idea is interesting and it makes sense that it improves the APS score. However, it's not a typo! The non-randomized version of APS just has very poor set size because the model can often be confidently wrong (and without randomization this can lead to massive sets). Thanks for the question!

…

On Thu, Sep 14, 2023 at 1:24 AM Sofian Zalouk ***@***.***> wrote: Hello, Thank you for providing these notebooks for conformal prediction, they have been immensely helpful. Reading through the section 2.1 of the paper on "Classification with Adaptive Prediction Sets" and the associated notebook, I had some questions about the scoring function. Namely, the paper provides the score function $$s(x,y) = \sum_{j=1}^k \hat{f}(x)_{\pi_j(x)}$$ where $y = \pi_j(x)$. Why are we including $\hat{f}(x)_{y}$ in the sum? Doing so would lead to some possibly problematic scores. Consider, for example, a perfect predictor that assigns all of its mass to the correct label $y$, and a completely incorrect predictor that assigns all of its mass to some incorrect label $\ne y$. Both of these predictors would have the same score of 1. This breaks the assumption that a higher score corresponds to misalignment between the forecaster and the true label. Investigating this issue further, I tried modifying the score function to greedily include all classes up to, but not including, the true label. Intuitively, a higher score would correspond to more probability mass assigned to incorrect labels, which is a better estimate of misalignment. Coding this up in the notebook for APS, this little fix increased the coverage slightly, but more importably it decreased the mean size of the confidence sets to 3.3 (compared to 187.5 in the original notebook). The confidence sets on the imagenet examples also seem to make more sense upon preliminary inspection. This could possibly address an issue raised previously <#8>. Is there a typo/error in the score function of APS that would explain these results? Thanks in advance! — Reply to this email directly, view it on GitHub <#11>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGBYOUW27JCFWXCVM2EZA5LX2IXBDANCNFSM6AAAAAA4XFPMHU> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>