smartschat / cort

A toolkit for coreference resolution and error analysis.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ParentedTree creeps into "head" attribute

minhlab opened this issue · comments

When I read this document from CoNLL-2012 into cort, a TypeError is thrown. The ParentedTree enter "head" in file mention_property_computer.py around line 241 (head = [head_tree[0]]). The value can be traced to head_finder but I stopped there because there are a lot of alternative rules.

>>> from cort.core.corpora import Corpus
>>> with open('output/debug.conll') as f:
...     Corpus.from_file('test', f)
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/minhle/.local/lib/python3.5/site-packages/cort-0.2.4.5-py3.5.egg/cort/core/corpora.py", line 79, in from_file
    documents.append(from_string("".join(current_document)))
  File "/home/minhle/.local/lib/python3.5/site-packages/cort-0.2.4.5-py3.5.egg/cort/core/corpora.py", line 14, in from_string
    return documents.CoNLLDocument(string)
  File "/home/minhle/.local/lib/python3.5/site-packages/cort-0.2.4.5-py3.5.egg/cort/core/documents.py", line 414, in __init__
    super(CoNLLDocument, self).__init__(identifier, sentences, coref)
  File "/home/minhle/.local/lib/python3.5/site-packages/cort-0.2.4.5-py3.5.egg/cort/core/documents.py", line 97, in __init__
    self.annotated_mentions = self.__get_annotated_mentions()
  File "/home/minhle/.local/lib/python3.5/site-packages/cort-0.2.4.5-py3.5.egg/cort/core/documents.py", line 111, in __get_annotated_mentions
    span, self, first_in_gold_entity=set_id not in seen
  File "/home/minhle/.local/lib/python3.5/site-packages/cort-0.2.4.5-py3.5.egg/cort/core/mentions.py", line 174, in from_document
    mention_property_computer.compute_gender(attributes)
  File "/home/minhle/.local/lib/python3.5/site-packages/cort-0.2.4.5-py3.5.egg/cort/core/mention_property_computer.py", line 91, in compute_gender
    if __wordnet_lookup_gender(" ".join(attributes["head"])):
TypeError: sequence item 0: expected str instance, ParentedTree found

More precisely, the problematic head is ['in', 'which']. In this case, I use head = head_tree.leaves() instead.