graph-net-project

🚀 This project examines graph-based data storage for managing and connecting nodes from components such as Compounds, Diseases, Genes, and Anatomies with an interactive Graphical User Interface for Queries.

📖 Files

The "nodes_test.tsv" contains over 20,000 nodes pertaining to these four element types with each unique attribute such as ID, Name, and Kind.

ID	Name	Kind
Anatomy::UBERON:0000042	serous membrane	Anatomy
Compound::DB00396	Progesterone	Compound

The "edges_test.tsv" file contains over 1M edge relationships between a target and source node with individually labeled relationship types referred to by the "metaedges.tsv" file.

Metaedge	abbreviation	edges	source_nodes	target_nodes	unbiased
Anatomy downregulates Gene	AdG	102240	36	15097	102240
Anatomy - expresses - Gene	AeG	526407	241	18094	453477
Anatomy - upregulates - Gene	AuG	97848	36	15929	97848

💡 NOTE

Files Can be hosted on a local Python server using: python3 -m http.server
The Neo4J Base Server Requires authentication and must remain active.
When creating a database, the data should only be loaded once for both Nodes and Edges.
To execute a query from the terminal:
- run: python3 projectBD.py <"QUERY SELECTED">

Example Node Structure

When finding possible treatments for Diseases that have no direct connection to any Compound, the approach for such a query is to begin by navigating the genes that are DownRegulated\UpRegulated by a Compound and Anatomy, in the opposite direction, in which the same Disease localizes. This would create the following graph.

📝 QUERIES

The following Cypher Queries solve a specific portion of the project using Neo4J as a graph-based NoSQL Store.

Return Disease Name

MATCH (n WHERE n.name='Disease' AND 
n.id ='Disease::DOID:8577') 
RETURN n

Return Compounds that Palliate or Treat Disease

MATCH m=(n:Data)-[:CpD|CtD]->(b:Data where 
b.id='Disease::DOID:7148') RETURN n

Return Genes that Cause this Disease

MATCH p=(a:Data WHERE a.id='Disease::DOID:7148')
-[r:DaG]->(n:Data where n.name ='Gene') RETURN n

Return Where Disease Occurs

MATCH p=(a:Data WHERE a.id ='Disease::DOID:7148')
-[r:DlA]->(n:Data) RETURN

Potential Cures to Diseases

match p = (d:Data where d.name='Disease')-[:DlA]->
(a:Data where a.name ='Anatomy')-[:AuG|AdG]->(g:Data where g.name ='Gene')with d,a,g
match (n:Data where n.name='Compound')-[:CdG|CuG]->
(f:Data where f.name ='Gene' and f.id = g.id)
with d,a,g,n match (n) where not (n)-[:CtD|CpD]->(d) return n

Loading Nodes: ALREADY LOADED

LOAD CSV WITH HEADERS FROM "http://localhost:8000/nodes_test.tsv" 
As row FIELDTERMINATOR "\t"
Create (n:Data {name:row.kind, id:row.id, dataName:row.name})

Loading Edges: ALREADY LOADED

LOAD CSV WITH HEADERS FROM "http://localhost:8000/edges_test.tsv" AS row FIELDTERMINATOR "\t"
WITH row
WHERE row.ource IS NOT NULL AND row.target IS NOT NULL and row.metaedge is not NULL
MERGE (s:Data {id: row.ource})
MERGE (t:Data {id: row.target})
WITH s, t, row
CALL apoc.create.relationship(s, row.metaedge, {}, t) YIELD rel
RETURN *

About

Examining the versatility of database management systems, we use Neo4j along with Spark to establish connections between nodes and optimize user question-oriented queries.

Languages

Language:Python 73.4%Language:JavaScript 26.6%