jmbanda / BLAH2015

Source code generated during BLAH 2015

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BLAH2015 - Annotation group tools

NCBOannotate.py

This tool will take a text file in XML format with document id (id), Pubmed central id (pmcid), Text (abstractText) and will run the text through the NCBO annotator using the speficied ontologies.

Execution call (using RxNorm and CHEBI ontologies):

python NCBOannotate.py RXNORM,CHEBI test_lines.xml

Output file will be (for this case) test_lines.xml.out. The tool will produce a file with all the tags made in the following format:

document_id \t ducoment_pmcid \t term_code_ontology_url \t ontology_url \t perfered_term_text \t term_start_character_offset \t term_end_character_offset \t match_type

If you are trying to specify more than one ontology, you need to add it with a comma.

dictionary_annotate.py

This tool will use a dictionary to annotate any given text. The dictionaries are just a tab delimited file that contain term_identifier \t term_text \n. We have included some dictionaries here extracted from RxNORM, Phenominer, Chebi, FMA and PATO ontologies.

Execution call (using CHEBI dictionary):

python annotate_dictionary.py CHEBI-dic.csv test_lines.xml

Output file will be (for this case) test_lines.xml-CHEBI-dic.csv.txt. This tool will produce a file with all the tags made in the following format:

document_id \t document_PMCID \t dictionary_term_id \t dictionary_term_text \t character_offset_start \t character_offset_end \n

test_lines.xml

We have included the test_lines.xml file with one randomly selected Euro-pubmed article abstract inside of it.

Dictionaries

We have included dictionaries for the following ontologies:

CHEBI (https://github.com/jmbanda/BLAH2015/blob/master/CHEBI-dic.csv)

FMA (https://github.com/jmbanda/BLAH2015/blob/master/FMA-dic.csv)

PATO (https://github.com/jmbanda/BLAH2015/blob/master/PATO-dic.csv)

PHENOMINER (https://github.com/jmbanda/BLAH2015/blob/master/PHENOM-dic.csv)

RXNORM (https://github.com/jmbanda/BLAH2015/blob/master/RXNORM-dic.csv)

About

Source code generated during BLAH 2015

License:MIT License


Languages

Language:Python 100.0%