Multiple occurrences of a word not handled properly while creating tuples

Question

Multiple occurrences of a word not handled properly while creating tuples

abhaga opened this issue 13 years ago · comments

If there are multiple occurrences of a word in a sentence, lack of ids makes it impossible to identify the source and target of a dependency correctly.

If you are open to accepting a patch for this, I can submit one. My idea is to keep the ids in the "tuples" and store the dependents of a word in the "words" array.

Dustin Smith · Answer 1 · Tue Oct 11 2011 02:12:41 GMT+0800 (China Standard Time)

Hi Abhaya,

Thanks for bringing this to my attention and submitting the patch for the previous regular expression bug. In addition to tracking word ids, the current code ignores the sentence ids that are used to resolve coreferences between sentences. I agree that putting the ID into the word dictionary is a good idea -- maybe a (word, id) tuple?

I am preoccupied for at least two weeks, but if you end up writing something to do this, I'll incorporate your patch. Thanks for the help.

Dustin

Dustin Smith · Answer 2 · Thu Aug 16 2012 17:28:21 GMT+0800 (China Standard Time)

This is fixed in the current release.

Bob Lannon · Answer 3 · Sat Dec 08 2012 03:32:27 GMT+0800 (China Standard Time)

Not sure this is actually fixed!

I can see the word IDs in the output of the server, but the json that I'm receiving in my py code doesn't have the IDs. Is there a chance that this was broken by a subsequent change?