mavoweb / vastly

Everything you need to support a custom formula language

Home Page:https://vastly.mavo.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Insertion is currently O(N) on the number of children and DX is poor

LeaVerou opened this issue · comments

Writing this comment made me realize something: we currently only store the pointer to a node’s parent. This means that replacing a child node is nontrivial as looking up how we got from the parent to the child is nontrivial: we would need to try all possible child properties and compare. Even children() is no help here since it doesn't give us a pointer to each of these children, so we can replace them.

Potential solutions:

  1. Instead of the parent, store a { node, property, index } (index only used for array properties) or { node, path } object that contains all the information we need to get from the parent to the child.
    • Pros
      • Insertion becomes O(1).
    • Cons:
      • This is a more complicated structure
      • Any existing code that uses node.parent would break (but then again, it shouldn't use that directly)
  2. In addition to the current parent property, also store parent_property and parent_index properties.
    • Pros:
      • Maintains the simplicity of the current parent pointers
      • Insertion becomes O(1)
    • Cons:
      • Adds 3 properties to each node, which means they could get out of sync.
      • Makes it harder to use a WeakMap instead of node properties.
  3. children() argument that returns a data structure that retains this info.
    • Pros:
      • Keeps the parent pointer simple
    • Cons:
      • Insertion still O(N) on the number of children
  4. parents.pathTo(node) method to return {property, index} or [property, index]
    • Pros & Cons: Largely same as 3

What does DX mean?

Personally, I like option 1 the best

What does DX mean?

https://en.wikipedia.org/wiki/User_experience#Developer_experience

Personally, I like option 1 the best

Why?

Why?

Reason 1 is I think this is important to be able to do in O(1), and reason 2 is because although it is more complicated of a data structure, I think it is much less messy than the alternative of storing those properties flattened (option 2)

I think this is important to be able to do in O(1)

Is it? What's the ballpark for N?

I think this is important to be able to do in O(1)

Is it? What's the ballpark for N?

In the average case, I'd guess around 2-5, but I was thinking it was important just because it would add up with all the calls to it in succession, which is what Mavo would need to do. But I'm open to whatever solution you deem most appropriate in balancing efficiency and DX

If your concern is specifically about replacing nodes, then another option would be that instead of replacing the node you overwrite all its properties with the properties of the intended replacement node. Still not O(1) but now you are iterating over child node properties instead of parent node properties, if that is helpful.

Kookier solution: give each node a "replacement" pointer. To replace the node, point its replacement pointer at the new node. Now when you are following some pointer to a node, and discover its replacement pointer is full, replace the pointer you followed with a pointer to the new replacement node. This will be O(1).

If your concern is specifically about replacing nodes, then another option would be that instead of replacing the node you overwrite all its properties with the properties of the intended replacement node. Still not O(1) but now you are iterating over child node properties instead of parent node properties, if that is helpful.

I think Lea typically recommends not using Object.assign/overwriting properties of nodes since it is messy, but this is still a possibility to consider

Kookier solution: give each node a "replacement" pointer. To replace the node, point its replacement pointer at the new node. Now when you are following some pointer to a node, and discover its replacement pointer is full, replace the pointer you followed with a pointer to the new replacement node. This will be O(1).

Love this idea 😂 unfortunately probably a bit too complex, which will hurt the DX side of things

Another thought: since we have the existing function in parents, parents.get, it's an easy swap to using this more complex data structure since we can just tweak that function

If your concern is specifically about replacing nodes, then another option would be that instead of replacing the node you overwrite all its properties with the properties of the intended replacement node. Still not O(1) but now you are iterating over child node properties instead of parent node properties, if that is helpful.

As @adamjanicki2 said, this is very messy:

  1. It can break assumptions in calling code
  2. It's generally an antipattern to gut objects and replace their properties, since it's tricky to handle all edge cases properly.

But most importantly, it wouldn't even solve the problem in every case, since the replacement is often to introduce a new node between the existing parent-child relationship (e.g. when prepending, see #32 ).

Kookier solution: give each node a "replacement" pointer. To replace the node, point its replacement pointer at the new node. Now when you are following some pointer to a node, and discover its replacement pointer is full, replace the pointer you followed with a pointer to the new replacement node. This will be O(1).

I can see many cons to this, but not sure what the pros are. You list O(1) as one of them, but there are far less problematic solutions that also have this advantage. Am I missing something?

@karger anymore thoughts here?

@karger anymore thoughts here?

In terms of division of labor, @karger mainly weighs in with advice, he does not do the product management work of prioritizing or deciding which solution we'll go with.

But this is not a a pressing issue — it’s not really a blocker for any of the other stuff we're working on; it's only a performance optimization.

Sounds good

Coming back to this after more findings from #39, my vote is still solution 1, augmenting the parent data structure to contain the parent, property, and index

Coming back to this after more findings from #39, my vote is still solution 1, augmenting the parent data structure to contain the parent, property, and index

Agreed, let's do it.