graphistry / pygraphistry

PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Schema: Typed topology

lmeyerov opened this issue · comments

Most property graph systems follow a typed topology, which is useful information for a path query synthesizer

  • The method should extract for the connected database(s) a description of typed & directed connectivities. Using cypher syntax, this might look like:
(s:T1)-[e:T2]->(d:T3)
(s:T1)-[e:T2]->(d:T4)
(s:T1)-[e:T5]->(d:T6)
...

To simplify manipulation, this should be represented with some sort of typed representation vs a big string:

@dataclass
class TypedTriple
  directed: bool
  source_type: str
  edge_type: str
  destination_type: str

async def infer_typed_topology(...) -> List[TypedTriple]
  • Databases may be large, e.g., billion scale, so a strategy or controls should be in place to support working with large systems

Note that node/edge attributes are not represented, nor are details like path lengths

Bringing this back up as useful for gfql , loiue, etc, as came up in a customer call today (gfql user) --

  • In PyGraphistry side, probably something like:

    • user can bind which node/edge field to use as a type column,g.bind(edge_type=...), marking a property graph
    • user can call g2 = g1.schema(), with options like sample=True for big graph support
    • return is a new graph representing the typed ontology
  • We have a schema inference implemented in louie for neptune, but would help in pygraphistry for all users, not just neptune, so probably 80% of the work here is just moving to OSS and exposing thin bindings over what we already did