Different in parse results compared to the demo page

Question

Different in parse results compared to the demo page

alexbenari opened this issue 2 years ago · comments

I am getting a sub-optimal parse for a certain sentence while in the demo page it is parsed well (https://parser.kitaev.io/).
What can be the source of the difference?

The sentence is:
Its more easy to write him an email
I tried with both benepar_en3 and benepar_en3_large. Both tag "Its" as VBZ while the demo page sets it to PRP.

Any idea what I need to change to get the parse I see on the demo page?
I am using a custom tokenizer but I doubt in this specific sentence it makes a difference, as each word is a token as I expect is the case also with Spacey. I also made sure I am using the latest versions of the models.

alexbenari · Answer 1 · Tue Jan 04 2022 21:55:39 GMT+0800 (China Standard Time)

A quick follow-up: I am seeing more and more such examples. Here is an extremely simple one:
What happened
Site demo parses as SBARQ, my code running the same parser parses as SBAR

So puzzling! I would really appreciate an insight on this. I feel I have exhausted my options.

Nikita Kitaev · Answer 2 · Wed Jan 05 2022 03:04:31 GMT+0800 (China Standard Time)

The demo is running the benepar_en2 model (see the "About" section on the demo page), but it looks like you're using benepar_en3/benepar_en3_large.

Some variation between different training runs is inevitable, even for models based on an identical architecture. In this case there is further variation because the newer parser models are based on T5, while the demo page runs a BERT-based model. You can also try benepar_en3_wsj, which is a BERT-based model.

The benepar_en2 model only works with a prior benepar release (0.1.3). The old release should still work if you pin dependencies appropriately, but it's reached end-of-life and will not be receiving any further maintenance.

alexbenari · Answer 3 · Wed Jan 05 2022 15:51:52 GMT+0800 (China Standard Time)

Thanks a lot! Indeed, I missed the benepar_en2 in the "About" section, my bad. I will give benepar_en3_wsj a try.