Schema: Typed topology
lmeyerov opened this issue · comments
Most property graph systems follow a typed topology, which is useful information for a path query synthesizer
- The method should extract for the connected database(s) a description of typed & directed connectivities. Using cypher syntax, this might look like:
(s:T1)-[e:T2]->(d:T3)
(s:T1)-[e:T2]->(d:T4)
(s:T1)-[e:T5]->(d:T6)
...
To simplify manipulation, this should be represented with some sort of typed representation vs a big string:
@dataclass
class TypedTriple
directed: bool
source_type: str
edge_type: str
destination_type: str
async def infer_typed_topology(...) -> List[TypedTriple]
- Databases may be large, e.g., billion scale, so a strategy or controls should be in place to support working with large systems
Note that node/edge attributes are not represented, nor are details like path lengths
Bringing this back up as useful for gfql , loiue, etc, as came up in a customer call today (gfql
user) --
-
In PyGraphistry side, probably something like:
- user can bind which node/edge field to use as a type column,
g.bind(edge_type=...)
, marking a property graph - user can call
g2 = g1.schema()
, with options likesample=True
for big graph support - return is a new graph representing the typed ontology
- user can bind which node/edge field to use as a type column,
-
We have a schema inference implemented in louie for neptune, but would help in pygraphistry for all users, not just neptune, so probably 80% of the work here is just moving to OSS and exposing thin bindings over what we already did