kermitt2 / delft

a Deep Learning Framework for Text

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

reader.py misses the the <EX_ENAMEX> annotated entities on LeMonde Corpus

pjox opened this issue · comments

Hello,

I recently found that the reader.py script does not parse the entities annotated on the French corpus LeMonde annotated as <EX_ENAMEX>, meaning that it misses some entities and also that it cuts some sentences as it ignores the text inside the tags.

This should not have a big impact as the number of entities annotated like this is rather little, but still it would be nice to patch the little bug. 😄

Thanks!

Thank you @pjox !

This has been fixed by commit 838ea35 in branch bert-sequence-labeling and it will be in master when the branch will be merged...

merged...