splitgraph / seafowl

Analytical database for data-driven Web applications 🪶

Home Page:https://seafowl.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

chore: Refactor SeafowlContext

gruuya opened this issue · comments

The core Seafowl logic found in DefaultSeafowlContext struct implementation and the accompanying SeafowlContext trait is long overdue for a refactor.

  • On one hand, there's no need to have a separate SeafowlContext trait which can be confusing to newcomers, as all logic should really go into DefaultSeafowlContext (which should then be renamed to SeafowlContext).
    The only reason this was kept was because there are recursive calls inside SeafowlContext::create_physical_plan in some cases, and since this is an async function this only works for traits, whereas for structs it leads to compilation issues: https://rust-lang.github.io/async-book/07_workarounds/04_recursion.html
    However, this can easily be resolved in any of three ways:
    • Instead of calling self.create_physical_plan call self.inner.state().create_physical_plan, thus breaking the recursion. This is safe to do, since currently all custom planning in SeafowlContext::create_physical_plan results in a dummy (empty) plan and those are never used as inputs anywhere else (i.e. we only handle root plan nodes there, and the input planning can be delegated to DataFusion).
    • If we do end up needing to do actual recursion there at some point then it's enough to decorate the method with #[async_recursion] as is done in DataFusion.
    • Finally if we don't want to rely on a third party crate to facilitate this recursion there's always the async move + BoxFuture route.
  • On the other hand the entire logic is kept in a single file with 2.5K+ lines (with tests), while it really should be broken up into logically distinct groups of impls (i.e. logical planning, physical planning, delta operations etc.) across several files in a separate module.