Insertion is currently O(N) on the number of children and DX is poor

Question

Insertion is currently O(N) on the number of children and DX is poor

LeaVerou opened this issue 6 months ago · comments

Writing this comment made me realize something: we currently only store the pointer to a node’s parent. This means that replacing a child node is nontrivial as looking up how we got from the parent to the child is nontrivial: we would need to try all possible child properties and compare. Even children() is no help here since it doesn't give us a pointer to each of these children, so we can replace them.

Potential solutions:

Instead of the parent, store a { node, property, index } (index only used for array properties) or { node, path } object that contains all the information we need to get from the parent to the child.
- Pros
  - Insertion becomes O(1).
- Cons:
  - This is a more complicated structure
  - Any existing code that uses node.parent would break (but then again, it shouldn't use that directly)
In addition to the current parent property, also store parent_property and parent_index properties.
- Pros:
  - Maintains the simplicity of the current parent pointers
  - Insertion becomes O(1)
- Cons:
  - Adds 3 properties to each node, which means they could get out of sync.
  - Makes it harder to use a WeakMap instead of node properties.
children() argument that returns a data structure that retains this info.
- Pros:
  - Keeps the parent pointer simple
- Cons:
  - Insertion still O(N) on the number of children
parents.pathTo(node) method to return {property, index} or [property, index]
- Pros & Cons: Largely same as 3

Adam Janicki · Answer 1 · Tue Jan 16 2024 03:39:05 GMT+0800 (China Standard Time)

What does DX mean?

Adam Janicki · Answer 2 · Tue Jan 16 2024 03:39:32 GMT+0800 (China Standard Time)

Personally, I like option 1 the best

Lea Verou · Answer 3 · Tue Jan 16 2024 03:54:20 GMT+0800 (China Standard Time)

What does DX mean?

https://en.wikipedia.org/wiki/User_experience#Developer_experience

Personally, I like option 1 the best

Why?

Adam Janicki · Answer 4 · Tue Jan 16 2024 04:08:24 GMT+0800 (China Standard Time)

Why?

Reason 1 is I think this is important to be able to do in O(1), and reason 2 is because although it is more complicated of a data structure, I think it is much less messy than the alternative of storing those properties flattened (option 2)

Lea Verou · Answer 5 · Tue Jan 16 2024 05:12:26 GMT+0800 (China Standard Time)

I think this is important to be able to do in O(1)

Is it? What's the ballpark for N?

Adam Janicki · Answer 6 · Tue Jan 16 2024 05:48:48 GMT+0800 (China Standard Time)

I think this is important to be able to do in O(1)

Is it? What's the ballpark for N?

In the average case, I'd guess around 2-5, but I was thinking it was important just because it would add up with all the calls to it in succession, which is what Mavo would need to do. But I'm open to whatever solution you deem most appropriate in balancing efficiency and DX

David Karger · Answer 7 · Tue Jan 16 2024 06:13:52 GMT+0800 (China Standard Time)

If your concern is specifically about replacing nodes, then another option would be that instead of replacing the node you overwrite all its properties with the properties of the intended replacement node. Still not O(1) but now you are iterating over child node properties instead of parent node properties, if that is helpful.

Kookier solution: give each node a "replacement" pointer. To replace the node, point its replacement pointer at the new node. Now when you are following some pointer to a node, and discover its replacement pointer is full, replace the pointer you followed with a pointer to the new replacement node. This will be O(1).

Adam Janicki · Answer 8 · Tue Jan 16 2024 06:49:43 GMT+0800 (China Standard Time)

If your concern is specifically about replacing nodes, then another option would be that instead of replacing the node you overwrite all its properties with the properties of the intended replacement node. Still not O(1) but now you are iterating over child node properties instead of parent node properties, if that is helpful.

I think Lea typically recommends not using Object.assign/overwriting properties of nodes since it is messy, but this is still a possibility to consider

Kookier solution: give each node a "replacement" pointer. To replace the node, point its replacement pointer at the new node. Now when you are following some pointer to a node, and discover its replacement pointer is full, replace the pointer you followed with a pointer to the new replacement node. This will be O(1).

Love this idea 😂 unfortunately probably a bit too complex, which will hurt the DX side of things

Adam Janicki · Answer 9 · Thu Jan 18 2024 01:14:19 GMT+0800 (China Standard Time)

Another thought: since we have the existing function in parents, parents.get, it's an easy swap to using this more complex data structure since we can just tweak that function

Lea Verou · Answer 10 · Thu Jan 18 2024 01:20:46 GMT+0800 (China Standard Time)

If your concern is specifically about replacing nodes, then another option would be that instead of replacing the node you overwrite all its properties with the properties of the intended replacement node. Still not O(1) but now you are iterating over child node properties instead of parent node properties, if that is helpful.

As @adamjanicki2 said, this is very messy:

It can break assumptions in calling code
It's generally an antipattern to gut objects and replace their properties, since it's tricky to handle all edge cases properly.

But most importantly, it wouldn't even solve the problem in every case, since the replacement is often to introduce a new node between the existing parent-child relationship (e.g. when prepending, see #32 ).

Kookier solution: give each node a "replacement" pointer. To replace the node, point its replacement pointer at the new node. Now when you are following some pointer to a node, and discover its replacement pointer is full, replace the pointer you followed with a pointer to the new replacement node. This will be O(1).

I can see many cons to this, but not sure what the pros are. You list O(1) as one of them, but there are far less problematic solutions that also have this advantage. Am I missing something?

Adam Janicki · Answer 11 · Sun Jan 21 2024 02:35:14 GMT+0800 (China Standard Time)

@karger anymore thoughts here?

Lea Verou · Answer 12 · Sun Jan 21 2024 02:57:50 GMT+0800 (China Standard Time)

@karger anymore thoughts here?

In terms of division of labor, @karger mainly weighs in with advice, he does not do the product management work of prioritizing or deciding which solution we'll go with.

But this is not a a pressing issue — it’s not really a blocker for any of the other stuff we're working on; it's only a performance optimization.

Adam Janicki · Answer 13 · Sun Jan 21 2024 09:50:13 GMT+0800 (China Standard Time)

Sounds good

Adam Janicki · Answer 14 · Sun Jan 28 2024 01:36:14 GMT+0800 (China Standard Time)

Coming back to this after more findings from #39, my vote is still solution 1, augmenting the parent data structure to contain the parent, property, and index

Lea Verou · Answer 15 · Mon Jan 29 2024 03:42:13 GMT+0800 (China Standard Time)

Coming back to this after more findings from #39, my vote is still solution 1, augmenting the parent data structure to contain the parent, property, and index

Agreed, let's do it.