khoa-yelo / bio-gdb

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Human1 for graph databases

Human1 is a very recent synthesis of two landmark human metabolic networks:

This repo translates Human1's SBML distribution into flat tsv files which can be used to fill databases. Our emphasis at present is on filling to graph databases, neo4j and the google biomedical data commons.

This first version is, of course, only preliminary. Expect changes in the next few days and weeks.

The ucrrent processing steps

  • firstExperiments/readSBML.R parses the xml into R data structures
  • firstExperiments/toTSV.R writes them out in language-neutral tab-delimited text
  • firstExperiments/import/loadAll.cypher loads these structures into a neo4j graph database

Expect schema revisions as we work with - start to query - these data.

The tab-delimited files (see firstExperiments/import/*.tsv).

The entities:

  • reactions.tsv: very simple, just an id and some chemical attributes
  • metabolites.tsv: called "species" by human1
  • genes.tsv: called "geneProducts" by human1, these are the enzymes in the reaction
  • groups.tsv: roughly speaking, these are pathways

Their relationships:

  • reactantRoles.tsv: relationship between substrate metabolites and their reactions
  • productRoles.tsv: reactions and the metabolites they produce
  • geneRoles.tsv: which genes (enzymes) participate in which reactions
  • groupMemberships.tsv: non-overlapping assignment of reactions to groups (~pathways)

The SBML of one reaction record (with most namespaces removed):

About

License:Apache License 2.0


Languages

Language:R 89.3%Language:JavaScript 9.1%Language:Makefile 1.6%