koaning / doubtlab

Doubt your data, find bad labels.

Home Page:https://koaning.github.io/doubtlab/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Doubt Reason Based on Entropy

koaning opened this issue · comments

If a machine learning model is very "confident" then the proba scores will have low entropy. The most uncertain outcome is a uniform distribution which would contain high entropy. Therefore, it could be sensible to add entropy as a reason for doubt.

I wonder ... what's a reasonable threshold here?

I see two ways to use this:

  • Return all predictions with a high uncertainty; E > T1
  • Return all predictions with a high certainty, that don't match the dataset label; argmax(P) != Y, E < T2

I've been thinking about your question about the threshold, but I haven't been able to figure out a reasonable threshold value. I've been combing through some litterature related to this, but if such a threshold is used, it is often just a hyperparameter that is tuned, without a theoretical argument.
One thing that might help is to use the Normalized Shannon Entropy, since entropy values for distributions with a different number of classes are difficult to compare. A method that I could see working would be to determine the threshold relative to the entropy distribution of the dataset. The first thing that comes to mind would be to consider the lowest/highest percentiles, although I think there are more clever tricks available.

Normalized entropy, as described here seems like a sound idea! Thanks for the mention 👍 I think I'm fine with keeping the threshold as a hyperparameter in this entropy-reason if that prevents adding an assumption to the stack. I think it'd be good to gather feedback anyway.

Return all predictions with a high certainty, that don't match the dataset label; argmax(P) != Y, E < T2

I'm wondering ... is this something best addressed via WrongPredictionReason. We may want to add a hyperparameter there for this use-case.

Hi!

I created a PR for version 1 of the entropy reason here. I went for a threshold of 0.5, just because it worked well for the iris dataset. 0.2 would have produced way too many non-zeros.

Best
Robert

Another way to tackle the "wtf should the threshold be" problem: Maybe we can specify a quantile instead of an absolute threshold like 0.5. This means that we can specify some quantile alpha and then only just flag a share of alpha samples having the highest normalized Shannon entropies.

I'm wondering ... is this something best addressed via WrongPredictionReason. We may want to add a hyperparameter there for this use-case.

We also the ShortConfidence reason and the LongConfidence reason.

Maybe we can specify a quantile instead of an absolute threshold like 0.5.

Part of me likes the idea. But I'm worried that we may introduce a lot of hyperparams and that at the moment it's unclear how much more useful doubt based on entropy will be compared to the margin-based reason.

I think it's possible to use Hoover index instead of entropy: it's easier to compute, it is always in 0-1 range and has clear explanation (0 - equality/uniformity, 1 - inequality).

There is also a bigger problem with this approach in multiclass setting: assume you have 4 classes, if your probas are 0.25-0.25-0.25-0.25 then entropy/uniformity measure will correctly find them, but if you have something like 0-0 5-0.5-0 than it will fail, but this sample still could be mislabeled. This problem becomes even more sever with more classes. Straightforward solution would be to use one-vs-rest scheme.

I'm wondering ... can we come up with a situation where entropy based doubt can adress issues that the other reasons cannot?