Is the number of POSSIBLE different from the number of "B-" tokens?
Hans0124SG opened this issue · comments
Thanks for the great library.
Just wondering about the logic of calculating the number of the POSSIBLE tokens.
If my pred is
B-ORG, I-ORG, B-ORG, I-ORG
and my true label is
B-ORG, I-ORG, I-ORG, I-ORG
I think the current logic will calculate the POSSIBLE as 2? But there is only 1 gold-standard annotation.
If 2 is correct, that means POSSIBLE cannot be interpreted as the number of gold-standard entities in the data, am I right?
Hi @Hans0124SG thanks for raising an issue, can you write a short reproducible example?
Sure.
from nervaluate import compute_metrics, collect_named_entities
true = ['O', 'B-ORG', 'I-ORG', 'I-ORG', 'I-ORG', 'O']
pred = ['O', 'B-ORG', 'I-ORG', 'B-ORG', 'I-ORG', 'O']
result, entity_level_result = compute_metrics(collect_named_entities(true), collect_named_entities(pred), ['ORG'])
entity_level_result['ORG']
I get the following output:
{'strict': {'correct': 0,
'incorrect': 2,
'partial': 0,
'missed': 0,
'spurious': 0,
'precision': 0,
'recall': 0,
'f1': 0,
'actual': 2,
'possible': 2},
'ent_type': {'correct': 2,
'incorrect': 0,
'partial': 0,
'missed': 0,
'spurious': 0,
'precision': 0,
'recall': 0,
'f1': 0,
'actual': 2,
'possible': 2},
'partial': {'correct': 0,
'incorrect': 0,
'partial': 2,
'missed': 0,
'spurious': 0,
'precision': 0,
'recall': 0,
'f1': 0,
'actual': 2,
'possible': 2},
'exact': {'correct': 0,
'incorrect': 2,
'partial': 0,
'missed': 0,
'spurious': 0,
'precision': 0,
'recall': 0,
'f1': 0,
'actual': 2,
'possible': 2}}
Possible is 2, but there is only 1 entity in the true label sequence.
Thanks for this @Hans0124SG. This looks like bug. possible
should be interpreted as the maximum matches in the true data.
Thanks @ivyleavedtoadflax
I suspect that this is not really a bug.
According to the definition:
POS = COR + INC + PAR + MIS = TP + FN
However, since the predicted entity is never 1-to-1 mapped to the true entity, TP + FN does not equal to all the positive labels.
That's why I feel the POS is just not the total number of gold-standard entities.
How do you think?
Btw, this phenomenon exists in @davidsbatista 's ner_evaluation library as well.
Yes you're right, I realized while implementing a test for it. There are a few unexpected results like this, I think there was another one raised in the issues on @davidsbatista's original repo.
The solution is probably just to document them.
Yeah great, thanks for the confirmation. Hope this is useful for other people who have the same doubt.