graphistry / pygraphistry

PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[FEA] GFQL indexing support

lmeyerov opened this issue · comments

Is your feature request related to a problem? Please describe.

One of the easiest ways to speedup pandas code is to support indexes

Describe the solution you'd like

GFQL has a few interesting scenarios here:

  • Indexing on node/edge IDs wrt global lookups
  • Indexing on additional columns, especially text
  • Indexes being passed in
  • Indexing being requested
  • Indexes happening on-the-fly at start & mid-traversal

It's unclear what's most important, I'm guessing:

  • node/edge ID indexing <-- may give near-parity w/ naive non-DF-based graph traversals
  • str indexing, esp for initial searches
  • some sort of triple support
  • Multicol indexing with some attribs of interest like node/edge type

A tricky aspect here is global vs dynamic indexing. Ex:

  • ahead of time, or start, generate indexes on the 'global' node + edge DFs, and have those get used mid-traversal when sufficiently small etc, such as during enrichment
  • dynamic reindexing mid-traversal

A lot of this gets into query planning, so another consideration is identify something very simple now, and defer the rest to a more structured planning system