π This project examines graph-based data storage for managing and connecting nodes from components such as Compounds, Diseases, Genes, and Anatomies with an interactive Graphical User Interface for Queries.
- The "nodes_test.tsv" contains over 20,000 nodes pertaining to these four element types with each unique attribute such as ID, Name, and Kind.
ID | Name | Kind |
---|---|---|
Anatomy::UBERON:0000042 | serous membrane | Anatomy |
Compound::DB00396 | Progesterone | Compound |
- The "edges_test.tsv" file contains over 1M edge relationships between a target and source node with individually labeled relationship types referred to by the "metaedges.tsv" file.
Metaedge | abbreviation | edges | source_nodes | target_nodes | unbiased |
---|---|---|---|---|---|
Anatomy downregulates Gene | AdG | 102240 | 36 | 15097 | 102240 |
Anatomy - expresses - Gene | AeG | 526407 | 241 | 18094 | 453477 |
Anatomy - upregulates - Gene | AuG | 97848 | 36 | 15929 | 97848 |
-
Files Can be hosted on a local Python server using: python3 -m http.server
-
The Neo4J Base Server Requires authentication and must remain active.
-
When creating a database, the data should only be loaded once for both Nodes and Edges.
-
To execute a query from the terminal:
- run: python3 projectBD.py <"QUERY SELECTED">
- When finding possible treatments for Diseases that have no direct connection to any Compound, the approach for such a query is to begin by navigating the genes that are DownRegulated\UpRegulated by a Compound and Anatomy, in the opposite direction, in which the same Disease localizes. This would create the following graph.
The following Cypher Queries solve a specific portion of the project using Neo4J as a graph-based NoSQL Store.
MATCH (n WHERE n.name='Disease' AND
n.id ='Disease::DOID:8577')
RETURN n
MATCH m=(n:Data)-[:CpD|CtD]->(b:Data where
b.id='Disease::DOID:7148') RETURN n
MATCH p=(a:Data WHERE a.id='Disease::DOID:7148')
-[r:DaG]->(n:Data where n.name ='Gene') RETURN n
MATCH p=(a:Data WHERE a.id ='Disease::DOID:7148')
-[r:DlA]->(n:Data) RETURN
match p = (d:Data where d.name='Disease')-[:DlA]->
(a:Data where a.name ='Anatomy')-[:AuG|AdG]->(g:Data where g.name ='Gene')with d,a,g
match (n:Data where n.name='Compound')-[:CdG|CuG]->
(f:Data where f.name ='Gene' and f.id = g.id)
with d,a,g,n match (n) where not (n)-[:CtD|CpD]->(d) return n
LOAD CSV WITH HEADERS FROM "http://localhost:8000/nodes_test.tsv"
As row FIELDTERMINATOR "\t"
Create (n:Data {name:row.kind, id:row.id, dataName:row.name})
LOAD CSV WITH HEADERS FROM "http://localhost:8000/edges_test.tsv" AS row FIELDTERMINATOR "\t"
WITH row
WHERE row.ource IS NOT NULL AND row.target IS NOT NULL and row.metaedge is not NULL
MERGE (s:Data {id: row.ource})
MERGE (t:Data {id: row.target})
WITH s, t, row
CALL apoc.create.relationship(s, row.metaedge, {}, t) YIELD rel
RETURN *