McSinyx / viwikipi

Vietnamese Wikipedia Paraphase Identity experiments

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ill-defined evaluation on XLM

McSinyx opened this issue · comments

As raised from sklearn.metrics.classification:

UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 due to no predicted samples.

I have not investigate this any further.

Just googled and I'm gonna just put a link here. It's still quite unclear to me, so I'd appreciate it if Dr. Tung can elaborate further.

Wikipedia: https://en.wikipedia.org/wiki/F1_score

DeepAI: https://deepai.org/machine-learning-glossary-and-terms/f-score

This is a SO question that probably answered this. I haven't checked, though.

Hmm... Just read the SO question. The answer say that compute_metric was used with an empty preds, and yea, it is.

I've run a few time the task and now I'm sure that preds = np.argmax(preds, axis=1) around line 250-260 caused the thing.

Before it:

preds = [[ 0.1553037   0.00259993]
 [ 0.30144626 -0.06917176]
 [ 0.07331852  0.02836521]
 [ 0.14101419  0.0055312 ]
 [ 0.20048355 -0.01591815]
 [ 0.20776229 -0.03293395]
 [ 0.29013884 -0.05500412]
 [ 0.3218491  -0.0702171 ]]

After it:

preds = [0 0 0 0 0 0 0 0]

If I change that to argmin (so that the values are not zero), it will returns f1 = 0.6666666666666666 and acc = 0.5, and the warning doesn't show up, since preds = [1 1 1 1 1 1 1 1]. Though, I don't think this is the expected result. We need to somehow make it understand that [0 0 0 0 0 0 0 0] are the indexes and not empty values.

Great catch 😛 Since this because we ran the test on too tiny data--though, I'm not saying it's absolutely impossible for this to happen on a larger dataset--I'm closing this now as there is nothing we can do about it at the moment (and say thank to probability!).

Edit: don't mind the label, it's just that this is probably my only chance to use it in my entire life.

As posted by @trahoa in #3 (comment), XLM models are not evaluated (or trained perhaps) correctly. I'm moving the discussion here.

This is due to too fast learning rate. After adjusting Adam epsilon (to 0.0001) and rate (to 0.000001), xlm-mlm-17-1280 converges just fine. Closing this issue for now.