serenalotreck / knowledge-graph

Code exploring the creation of knowledge graphs from plant science papers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

knowledge-graph

NOTE: From 07 December 2022, I am working in the subsetted repo, pickle-corpus-code

Code for the project Straying off-topic: Automated information extraction and knowledge graph construction in the plant sciences with out-of-domain pre-trained models In a PICKLE: Entity and relation annotation guidelines for the molecular plant sciences

Respository contents

  • annotation: Contains scripts for annotation-related tasks. These include utility scripts (abstract_scripts), experimental scripts (verb_annotations), scrupts to calulate IAA (iaa), and the most recent version of the annotation.conf files used for annotation in brat (brat)
  • data_retreival: Contains scripts for obtaining raw text data. abstracts_only contains scripts to download abstracts from PubMed, and doc_clustering contains the scripts to choose from the downloaded abstracts for downstream use. oa_subset gets full text XML from the PubMed Open Access Subset; this project is currently only using abstracts, so this code has not been utilized for downstream tasks.
  • graph_formatting: Contains scripts for turning model output into the GraphViz DOT format for visualization in cytoscape. This code currently has an issue where, although the output is compliant with DOT, it cannot be visualized in cytoscape; TODO resolve this issue
  • jupyter_notebooks: Miscellaneous jupyter notebooks with data visualizations
  • models: Contains code to run the various kinds of models, both benchmarks and neual models, as well as a script to evaluate performance model-agnostically.
  • tests: Unit tests

About

Code exploring the creation of knowledge graphs from plant science papers


Languages

Language:Jupyter Notebook 66.6%Language:Python 33.2%Language:Shell 0.2%