INK-USC / PLE

Label Noise Reduction in Entity Typing (KDD'16)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bbn entity mention has negative index

wujsAct opened this issue · comments

commented

Why bbn test.json has the negative index for entity mention. The following is an example.
{"tokens": ["In", "1973", ",", "Wells", "Fargo", "&", "amp", ";", "Co.", "of", "San", "Francisco", "launched", "the", "Gold", "Account", ",", "which", "included", "free", "checking", ",", "a", "credit", "card", ",", "safe-deposit", "box", "and", "travelers", "checks", "for", "a", "$", "3", "monthly", "fee", "."], "senid": 24, "mentions": [{"start": -1, "labels": ["/ORGANIZATION/CORPORATION", "/ORGANIZATION"], "end": -1}, {"start": 10, "labels": ["/GPE/CITY", "/GPE"], "end": 12}], "fileid": "WSJ0085"}

@wujsAct , thank you for pointing out this issue. We found in the previous data processing pipeline the library we used failed to deal with some special characters and led to such problem. It happened only to a very minor part of the dataset. In previous datasets, these gold-standard mentions with negative indexes will not be included in the evaluation; and they will be included after fixing the issue. We have updated the dataset and its download link.