Ill-defined evaluation on XLM
McSinyx opened this issue · comments
As raised from sklearn.metrics.classification
:
UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 due to no predicted samples.
I have not investigate this any further.
Just googled and I'm gonna just put a link here. It's still quite unclear to me, so I'd appreciate it if Dr. Tung can elaborate further.
Wikipedia: https://en.wikipedia.org/wiki/F1_score
DeepAI: https://deepai.org/machine-learning-glossary-and-terms/f-score
This is a SO question that probably answered this. I haven't checked, though.
Hmm... Just read the SO question. The answer say that compute_metric
was used with an empty preds
, and yea, it is.
I've run a few time the task and now I'm sure that preds = np.argmax(preds, axis=1)
around line 250-260 caused the thing.
Before it:
preds = [[ 0.1553037 0.00259993]
[ 0.30144626 -0.06917176]
[ 0.07331852 0.02836521]
[ 0.14101419 0.0055312 ]
[ 0.20048355 -0.01591815]
[ 0.20776229 -0.03293395]
[ 0.29013884 -0.05500412]
[ 0.3218491 -0.0702171 ]]
After it:
preds = [0 0 0 0 0 0 0 0]
If I change that to argmin
(so that the values are not zero), it will returns f1 = 0.6666666666666666
and acc = 0.5
, and the warning doesn't show up, since preds = [1 1 1 1 1 1 1 1]
. Though, I don't think this is the expected result. We need to somehow make it understand that [0 0 0 0 0 0 0 0]
are the indexes and not empty values.
Great catch 😛 Since this because we ran the test on too tiny data--though, I'm not saying it's absolutely impossible for this to happen on a larger dataset--I'm closing this now as there is nothing we can do about it at the moment (and say thank to probability!).
Edit: don't mind the label, it's just that this is probably my only chance to use it in my entire life.
As posted by @trahoa in #3 (comment), XLM models are not evaluated (or trained perhaps) correctly. I'm moving the discussion here.
This is due to too fast learning rate. After adjusting Adam epsilon (to 0.0001) and rate (to 0.000001), xlm-mlm-17-1280 converges just fine. Closing this issue for now.