dasmith / stanford-corenlp-python

Python wrapper for Stanford CoreNLP tools v3.4.1

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multiple occurrences of a word not handled properly while creating tuples

abhaga opened this issue · comments

If there are multiple occurrences of a word in a sentence, lack of ids makes it impossible to identify the source and target of a dependency correctly.

If you are open to accepting a patch for this, I can submit one. My idea is to keep the ids in the "tuples" and store the dependents of a word in the "words" array.

Hi Abhaya,

Thanks for bringing this to my attention and submitting the patch for the previous regular expression bug. In addition to tracking word ids, the current code ignores the sentence ids that are used to resolve coreferences between sentences. I agree that putting the ID into the word dictionary is a good idea -- maybe a (word, id) tuple?

I am preoccupied for at least two weeks, but if you end up writing something to do this, I'll incorporate your patch. Thanks for the help.

Dustin

This is fixed in the current release.

Not sure this is actually fixed!

I can see the word IDs in the output of the server, but the json that I'm receiving in my py code doesn't have the IDs. Is there a chance that this was broken by a subsequent change?