Schema: Typed topology

Question

Schema: Typed topology

lmeyerov opened this issue a year ago · comments

Most property graph systems follow a typed topology, which is useful information for a path query synthesizer

The method should extract for the connected database(s) a description of typed & directed connectivities. Using cypher syntax, this might look like:

(s:T1)-[e:T2]->(d:T3)
(s:T1)-[e:T2]->(d:T4)
(s:T1)-[e:T5]->(d:T6)
...

To simplify manipulation, this should be represented with some sort of typed representation vs a big string:

@dataclass
class TypedTriple
  directed: bool
  source_type: str
  edge_type: str
  destination_type: str

async def infer_typed_topology(...) -> List[TypedTriple]

Databases may be large, e.g., billion scale, so a strategy or controls should be in place to support working with large systems

Note that node/edge attributes are not represented, nor are details like path lengths

lmeyerov · Answer 1 · Thu Oct 10 2024 08:14:34 GMT+0800 (China Standard Time)

Bringing this back up as useful for gfql , loiue, etc, as came up in a customer call today (gfql user) --

In PyGraphistry side, probably something like:
- user can bind which node/edge field to use as a type column,g.bind(edge_type=...), marking a property graph
- user can call g2 = g1.schema(), with options like sample=True for big graph support
- return is a new graph representing the typed ontology
We have a schema inference implemented in louie for neptune, but would help in pygraphistry for all users, not just neptune, so probably 80% of the work here is just moving to OSS and exposing thin bindings over what we already did