sebastianruder / NLP-progress

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Home Page:https://nlpprogress.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

English information extraction has incorrect F1 scores

tomaarsen opened this issue · comments

Hello!

Short description

The information_extraction page has illogical values for F1 scores.

Assumptions

This issue assumes that the F1 score is computed as:

F1 = 2 * (P * R)
         -------
         (P + R)

Beyond that, for the rest of this issue I'm assuming that the precision and recall are correct, and I use these values to compute my own F1 score. Apologies if these assumptions are incorrect.

Detailed concerns

  • For the Base dataset, both papers are listed with an F1 score which is higher than the precision or recall. I believe this to be non-sensical, even if a different alpha or beta is used in the F-measure computation.
  • For the Ambiguous dataset, both papers their F1 score skews higher than expected. (79.3 instead of 74.7 and 91.9 instead of 77.1)
  • For the ReVerb45k dataset, the CESI paper skews higher again (81.9 instead of 71.9)
  • For the ReVerb45k dataset, the Galárraga et al. paper shows an F1 of 0.5, while the precision and recall are 71.6 and 50.8, respectively. I would have expected an F1 of 59.4, not 0.5.

In short, these results do not seem correct. Whether the issue is with the F1-score computation, or the precision/recall is not something I know.

  • Tom Aarsen