English information extraction has incorrect F1 scores

Question

English information extraction has incorrect F1 scores

tomaarsen opened this issue 3 years ago · comments

Tom Aarsen commented 3 years ago

Hello!

Short description

The information_extraction page has illogical values for F1 scores.

Assumptions

This issue assumes that the F1 score is computed as:

F1 = 2 * (P * R)
         -------
         (P + R)

Beyond that, for the rest of this issue I'm assuming that the precision and recall are correct, and I use these values to compute my own F1 score. Apologies if these assumptions are incorrect.

Detailed concerns

For the Base dataset, both papers are listed with an F1 score which is higher than the precision or recall. I believe this to be non-sensical, even if a different alpha or beta is used in the F-measure computation.
For the Ambiguous dataset, both papers their F1 score skews higher than expected. (79.3 instead of 74.7 and 91.9 instead of 77.1)
For the ReVerb45k dataset, the CESI paper skews higher again (81.9 instead of 71.9)
For the ReVerb45k dataset, the Galárraga et al. paper shows an F1 of 0.5, while the precision and recall are 71.6 and 50.8, respectively. I would have expected an F1 of 59.4, not 0.5.

In short, these results do not seem correct. Whether the issue is with the F1-score computation, or the precision/recall is not something I know.

Tom Aarsen