metagraph-dev / metagraph

Multi-target API for graph analytics with Dask

Home Page:https://metagraph.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Utility methods + callables

jim22k opened this issue · comments

Some type conversions and algorithms are obvious:

  • Extract from NodeMap, yields a NodeSet
  • Filter NodeMap using NodeSet, yields a smaller NodeMap
  • Intersect two NodeSets, yields a smaller NodeSet
  • Count the number of nodes in a NodeSet/NodeMap
  • Count the number of edges in a Graph

Some of these are already proposed as required wrapper methods.

Other useful algorithms require a callable:

  • Neighbor aggregation: Reduce a Graph to a NodeMap by aggregating all out-edge weights of each node
  • Edge Bundling: Convert a DirectedGraph into an UndirectedGraph; alternatively, convert a MultiGraph into a Graph
  • NodeMap reduction: Reduce values from all nodes into a scalar value
  • NodeMap filter: Apply a function to all value; keep only those nodes where the function returns True

These all require a callable of some sorts to be passed. It might be as simple as max, min, or sum, but it could also be a user-defined function.

Aggregators

Aggregation (or reduction) methods generally take a binary function: lambda x, y: x + y. There are some cases where all values are needed simultaneously -- for example, to calculate the median.

Initial Proposal:

  1. Define a set of common aggregation functions: min, max, sum, product, any, all, count
    • Algorithm implementers are expected to handle these common functions
  2. User-defined functions will be binary operators
    • These will be jitted by metagraph, then passed to the concrete algorithm
    • Algorithms should handle these jitted functions

Filters

Filters are more complicated that aggregators. Filters technically require a unary operator, but in practice are often binary operators with a scalar (>0, ==max_val, etc). Another common filter practice is to accept a boolean array as input.

Initial Proposal:

  1. Define a set fo common filter functions: gt, ge, lt, le, eq, ne
  2. Filter functions take either:
    a. User-defined unary function
    b. Common binary function with an extra scalar
    c. Boolean object of the type NodeMap (same as object being filtered)
  3. Create a compare method between two objects; accept common binary functions or user-defined binary functions. Could return a NodeSet instead of a boolean NodeMap.
  4. Create a select method which takes a NodeSet as input and only keeps those nodes

Pre-dispatcher

Some of these changes require metagraph to modify inputs prior to dispatching. Jitting functions is an example of this.

Having a layer which allows the User API to differ from the concrete implementation API should be used with caution, but seems necessary and useful in several algorithms.