Doubt Reason Based on Entropy

Question

Doubt Reason Based on Entropy

koaning opened this issue 3 years ago · comments

vincent d warmerdam commented 3 years ago

If a machine learning model is very "confident" then the proba scores will have low entropy. The most uncertain outcome is a uniform distribution which would contain high entropy. Therefore, it could be sensible to add entropy as a reason for doubt.

vincent d warmerdam · Answer 1 · Wed Nov 24 2021 04:43:54 GMT+0800 (China Standard Time)

I wonder ... what's a reasonable threshold here?

Alex van Vorstenbosch · Answer 2 · Tue Nov 30 2021 15:57:39 GMT+0800 (China Standard Time)

I see two ways to use this:

Return all predictions with a high uncertainty; E > T1
Return all predictions with a high certainty, that don't match the dataset label; argmax(P) != Y, E < T2

I've been thinking about your question about the threshold, but I haven't been able to figure out a reasonable threshold value. I've been combing through some litterature related to this, but if such a threshold is used, it is often just a hyperparameter that is tuned, without a theoretical argument.
One thing that might help is to use the Normalized Shannon Entropy, since entropy values for distributions with a different number of classes are difficult to compare. A method that I could see working would be to determine the threshold relative to the entropy distribution of the dataset. The first thing that comes to mind would be to consider the lowest/highest percentiles, although I think there are more clever tricks available.

vincent d warmerdam · Answer 3 · Tue Nov 30 2021 16:32:49 GMT+0800 (China Standard Time)

Normalized entropy, as described here seems like a sound idea! Thanks for the mention 👍 I think I'm fine with keeping the threshold as a hyperparameter in this entropy-reason if that prevents adding an assumption to the stack. I think it'd be good to gather feedback anyway.

Return all predictions with a high certainty, that don't match the dataset label; argmax(P) != Y, E < T2

I'm wondering ... is this something best addressed via WrongPredictionReason. We may want to add a hyperparameter there for this use-case.

Robert Kübler · Answer 4 · Sun Dec 12 2021 04:53:58 GMT+0800 (China Standard Time)

Hi!

I created a PR for version 1 of the entropy reason here. I went for a threshold of 0.5, just because it worked well for the iris dataset. 0.2 would have produced way too many non-zeros.

Best
Robert

Robert Kübler · Answer 5 · Sun Dec 12 2021 05:00:37 GMT+0800 (China Standard Time)

Another way to tackle the "wtf should the threshold be" problem: Maybe we can specify a quantile instead of an absolute threshold like 0.5. This means that we can specify some quantile alpha and then only just flag a share of alpha samples having the highest normalized Shannon entropies.

vincent d warmerdam · Answer 6 · Sun Dec 12 2021 20:13:18 GMT+0800 (China Standard Time)

I'm wondering ... is this something best addressed via WrongPredictionReason. We may want to add a hyperparameter there for this use-case.

We also the ShortConfidence reason and the LongConfidence reason.

vincent d warmerdam · Answer 7 · Sun Dec 12 2021 20:17:17 GMT+0800 (China Standard Time)

Maybe we can specify a quantile instead of an absolute threshold like 0.5.

Part of me likes the idea. But I'm worried that we may introduce a lot of hyperparams and that at the moment it's unclear how much more useful doubt based on entropy will be compared to the margin-based reason.

Gleb Levitski · Answer 8 · Mon Dec 13 2021 15:42:34 GMT+0800 (China Standard Time)

I think it's possible to use Hoover index instead of entropy: it's easier to compute, it is always in 0-1 range and has clear explanation (0 - equality/uniformity, 1 - inequality).

There is also a bigger problem with this approach in multiclass setting: assume you have 4 classes, if your probas are 0.25-0.25-0.25-0.25 then entropy/uniformity measure will correctly find them, but if you have something like 0-0 5-0.5-0 than it will fail, but this sample still could be mislabeled. This problem becomes even more sever with more classes. Straightforward solution would be to use one-vs-rest scheme.

vincent d warmerdam · Answer 9 · Mon Dec 13 2021 16:32:35 GMT+0800 (China Standard Time)

I'm wondering ... can we come up with a situation where entropy based doubt can adress issues that the other reasons cannot?

vincent d warmerdam · Answer 10 · Wed Dec 15 2021 15:03:52 GMT+0800 (China Standard Time)

Fixed by #24