Buggy output for parentheticals

Question

Buggy output for parentheticals

trangham283 opened this issue 6 years ago · comments

Hi, I'm using the pretrained benepar model as described in Usage with NLTK. It does not produce (-LRB- -LRB-)/(-RRB- -RRB-) as other standard parsers for cases of parentheticals. For example, parsing this sentence:

Representative George Hansen (R., Idaho) drew a reprimand in nineteen eighty-four after a felony conviction for falsifying his financial disclosures.

gives

(S
(NP
(NP (JJ Representative) (NNP George) (NNP Hansen))
(PRN (( () (NP (NNP R.)) (, ,) (NP (NNP Idaho)) () ))))
(VP
(VBD drew)
(NP (DT a) (NN reprimand))
(PP (IN in) (NP (JJ nineteen) (JJ eighty-four)))
(PP
(IN after)
(NP
(NP (DT a) (NN felony) (NN conviction))
(PP
(IN for)
(S
(VP
(VBG falsifying)
(NP (PRP$ his) (JJ financial) (NNS disclosures))))))))
(. .))

The empty labels are particularly problematic when used with the trees.py module in this repo. Is this a bug or is this your own label convention?

Nikita Kitaev · Answer 1 · Tue Jan 01 2019 03:59:58 GMT+0800 (China Standard Time)

This is fixed in the v0.1.0 release today (at least for English). Thank you for pointing this out!

The issue here was that parentheses were printed un-escaped as (( ( ) instead of (-LRB- -LRB-).