nikitakit / self-attentive-parser

High-accuracy NLP parser with models for 11 languages.

Home Page:https://parser.kitaev.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Buggy output for parentheticals

trangham283 opened this issue · comments

Hi, I'm using the pretrained benepar model as described in Usage with NLTK. It does not produce (-LRB- -LRB-)/(-RRB- -RRB-) as other standard parsers for cases of parentheticals. For example, parsing this sentence:

Representative George Hansen (R., Idaho) drew a reprimand in nineteen eighty-four after a felony conviction for falsifying his financial disclosures.

gives

(S
(NP
(NP (JJ Representative) (NNP George) (NNP Hansen))
(PRN (( () (NP (NNP R.)) (, ,) (NP (NNP Idaho)) () ))))
(VP
(VBD drew)
(NP (DT a) (NN reprimand))
(PP (IN in) (NP (JJ nineteen) (JJ eighty-four)))
(PP
(IN after)
(NP
(NP (DT a) (NN felony) (NN conviction))
(PP
(IN for)
(S
(VP
(VBG falsifying)
(NP (PRP$ his) (JJ financial) (NNS disclosures))))))))
(. .))

The empty labels are particularly problematic when used with the trees.py module in this repo. Is this a bug or is this your own label convention?

This is fixed in the v0.1.0 release today (at least for English). Thank you for pointing this out!

The issue here was that parentheses were printed un-escaped as (( ( ) instead of (-LRB- -LRB-).