ecohealthalliance / EpiTator

EpiTator annotates epidemiological information in text documents. It is the natural language processing framework that powers GRITS and EIDR Connect.

Home Page:https://epitator.readthedocs.io/en/latest/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

annodoc to_json() does not work

aauss opened this issue · comments

There seems to be a to_json() method in annodoc.py, but when running this method on a AnnoDoc object, I get an error by running the following code

def annotate(text):
    ''' Returns an document annotated for dates, disease counts, diseases, and geoneames
    
    text -- a string to be annotated
    '''
    doc = AnnoDoc(text)
    doc.add_tiers(GeonameAnnotator())
    doc.add_tiers(ResolvedKeywordAnnotator())
    doc.add_tiers(CountAnnotator())
    doc.add_tiers(DateAnnotator())
    return doc

doc = annotate("There are five cases of ebola in Bavaria")
doc.to_json()

with the following error message:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-3b5d363834f0> in <module>
      1 doc = annotate("There are five cases of ebola in Bavaria")
----> 2 doc.to_json()

/anaconda3/envs/RKI/lib/python3.6/site-packages/epitator/annodoc.py in to_json(self)
     82         json_obj['tiers'] = {}
     83         for name, tier in self.tiers.items():
---> 84             json_obj['tiers'][name] = tier.to_json()
     85 
     86         return json.dumps(json_obj)

/anaconda3/envs/RKI/lib/python3.6/site-packages/epitator/annotier.py in to_json(self)
     43         docless_spans = []
     44         for span in self.spans:
---> 45             span_dict = span.__dict__.copy()
     46             del span_dict['doc']
     47             docless_spans.append(span_dict)

AttributeError: 'SentSpan' object has no attribute '__dict__'

Thanks for reporting the error. I have a potential fix here that replaces the to_json function with to_dict: #42
If you're building an API around EpiTator I think it will serve that purpose well, but you would be better off using a pickling library if you want to be able to save and restore AnnoDoc objects. If you're satisfied with the patch then I will deploy it to pypi. Let us know.
Regards,
-Nathan

Hello Nathan,
I think the to_dict() function is nice. Originally I used the to_json() function to get an overview what information are stored in AnnoDoc so I didn't mean to serialize the object this way. But getting more information is part of the other issue that has been resolved. Thank you very much!
Regards,
Auss