Ill-defined evaluation on XLM

Question

Ill-defined evaluation on XLM

McSinyx opened this issue 5 years ago · comments

As raised from sklearn.metrics.classification:

UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 due to no predicted samples.

I have not investigate this any further.

Ngô Ngọc Đức Huy · Answer 1 · Wed Nov 13 2019 23:19:01 GMT+0800 (China Standard Time)

Just googled and I'm gonna just put a link here. It's still quite unclear to me, so I'd appreciate it if Dr. Tung can elaborate further.

Wikipedia: https://en.wikipedia.org/wiki/F1_score

DeepAI: https://deepai.org/machine-learning-glossary-and-terms/f-score

This is a SO question that probably answered this. I haven't checked, though.

Ngô Ngọc Đức Huy · Answer 2 · Thu Nov 14 2019 10:20:39 GMT+0800 (China Standard Time)

Hmm... Just read the SO question. The answer say that compute_metric was used with an empty preds, and yea, it is.

I've run a few time the task and now I'm sure that preds = np.argmax(preds, axis=1) around line 250-260 caused the thing.

Before it:

preds = [[ 0.1553037   0.00259993]
 [ 0.30144626 -0.06917176]
 [ 0.07331852  0.02836521]
 [ 0.14101419  0.0055312 ]
 [ 0.20048355 -0.01591815]
 [ 0.20776229 -0.03293395]
 [ 0.29013884 -0.05500412]
 [ 0.3218491  -0.0702171 ]]

After it:

preds = [0 0 0 0 0 0 0 0]

If I change that to argmin (so that the values are not zero), it will returns f1 = 0.6666666666666666 and acc = 0.5, and the warning doesn't show up, since preds = [1 1 1 1 1 1 1 1]. Though, I don't think this is the expected result. We need to somehow make it understand that [0 0 0 0 0 0 0 0] are the indexes and not empty values.

McSinyx · Answer 3 · Thu Nov 14 2019 21:43:10 GMT+0800 (China Standard Time)

Great catch 😛 Since this because we ran the test on too tiny data--though, I'm not saying it's absolutely impossible for this to happen on a larger dataset--I'm closing this now as there is nothing we can do about it at the moment (and say thank to probability!).

Edit: don't mind the label, it's just that this is probably my only chance to use it in my entire life.

McSinyx · Answer 4 · Fri Nov 22 2019 12:22:20 GMT+0800 (China Standard Time)

As posted by @trahoa in #3 (comment), XLM models are not evaluated (or trained perhaps) correctly. I'm moving the discussion here.

McSinyx · Answer 5 · Tue Nov 26 2019 23:15:11 GMT+0800 (China Standard Time)

This is due to too fast learning rate. After adjusting Adam epsilon (to 0.0001) and rate (to 0.000001), xlm-mlm-17-1280 converges just fine. Closing this issue for now.