`map()` should not mutate the original AST

Question

`map()` should not mutate the original AST

LeaVerou opened this issue 7 months ago · comments

Analogously to array.map(), map() should not mutate the original AST. Instead, if no transformations are relevant, it should return a shallow clone of the node (so basically if no callback or if the callback doesn't return anything, it would be equivalent to a clone).

(Shallow because it would progressively become deep anyway, since the method is recursive, so doing deep cloning would be duplicate effort)

Lea Verou · Answer 1 · Wed Dec 06 2023 12:53:53 GMT+0800 (China Standard Time)

A few design issues to decide on:

Do we need to deep clone ignored nodes or is a shallow clone sufficient there?
Do we still clone nodes returned from the callback? Or do we assume these will already be new nodes and we don't need to?

@adamjanicki2 what do you think? @karger I'd love your opinion if you can figure out the context.

Adam Janicki · Answer 2 · Wed Dec 06 2023 22:44:35 GMT+0800 (China Standard Time)

@LeaVerou I went with no copy because javascript's map does not clone mapped elements. For example:

const arr = [{foo: 5}];
const arr2 = arr.map(el => el);
arr2[0].foo = 500; // elements of arr and arr2 are aliased, so this mutates the underlying object
console.log(arr); // prints [{foo: 500}]

Personally, I believe it's up to the implementor to do any necessary copying in the callback, especially since the callback may return a nested node anyways, in which case we'd be creating a shallow copy. Either way, we need to make it clear in the documentation what the intended behavior is, so once we come to a decision, I will update the function specs to spell it out clearly

Lea Verou · Answer 3 · Wed Dec 06 2023 23:10:28 GMT+0800 (China Standard Time)

Yeah, I'm leaning towards:

I think a shallow copy should be fine, since the structure of that subtree is not altered
Agreed, the callback should take care of that.

Either way, we need to make it clear in the documentation what the intended behavior is, so once we come to a decision, I will update the function specs to spell it out clearly

Perfect, thank you!

Lea Verou · Answer 4 · Thu Dec 07 2023 02:20:40 GMT+0800 (China Standard Time)

@adamjanicki2 Let's try to keep the design discussion here.

Quasi-related: I wonder if transform() might be a better name than map(). I chose map() as it relates to an existing concept authors may be familiar with, but if it doesn't have enough similarities that could hinder, rather than improve usability.

I definitely support this, to me, when I think of map, I think of something linear, not a complex tree structure like we have

a) I think you're thinking of implementation complexity, but it doesn't necessarily follow that the API exposed to users is equally complex (if we do our job well :)
b) not sure conceptually mapping requires linearity

Nevertheless, I'm now wondering if it may be better to actually expose two functions: transform() which actually modifies the AST, and map() which does what we're discussing here. And potentially even a clone() that is just a map() with no callback, just as sugar.

Adam Janicki · Answer 5 · Thu Dec 07 2023 03:06:17 GMT+0800 (China Standard Time)

Some comments:

I like the idea of having map, which creates a fresh node object, and transform which works in-place, and trivially clone, that all sounds good to me
The alternative implementation you wrote does not work, the main issue with it, which is why my code looks pretty complex, is that sometimes node[prop] is an array, so we can't just automatically pass it in to the recursive call of _map as the node argument; that's why I have the Array.isArray check in my function, and why in children we need the flatmap.

Adam Janicki · Answer 6 · Thu Dec 07 2023 03:27:49 GMT+0800 (China Standard Time)

Additionally, I think we may want to consider rethinking the o.only and o.except parameters. Currently, if a given node does not apply to those filters, we ignore the entire subtree. Maybe I am missing something, but I'm struggling to see how this is useful. Because of the behavior of ignoring entire subtrees, what if the o.only argument is of a simple type, such as Literal? The majority of the time, Literals lie towards the bottom of trees, and would likely be ignored.

Take for example the expression 2 + 5. It's a simple binary expression where left and right are both literals. Consider the following code:

const ast = parse("2+5");
const newAst = map(ast, node => node, {only: "Literal"});

Map would see that the root of the tree is not a literal, and not walk down any of it, which seems kind of useless. I could see myself finding the following useful:

// code that adds 2 to all literals
const ast = parse("2+5");
const newAst = map(ast, node => ({...node, value: node.value + 2}), {only: "Literal"});

The idea is that this code would only call the callback on matched nodes, but still explore subtrees if the node is of the incorrect type. @LeaVerou thoughts on this? am I missing something?

Lea Verou · Answer 7 · Thu Dec 07 2023 04:13:24 GMT+0800 (China Standard Time)

The majority of the time, Literals lie towards the bottom of trees, and would likely be ignored.

Not the majority of the time, 100% of the time, since Literals have no children, so by definition they can only be leaf nodes 😅

The except parameter came from the ignore option in Mavo.Script.walk(), but I cannot find any instances of it being used in the Mavo codebase (could you verify?). I must have added the only option when I ported to vastly, which is exactly the kind of thing one should not do (add things for completeness, without concrete use cases). Assuming we cannot find actual use cases, the sane thing probably is to remove them.

That said, the kind of filtering you are proposing can easily be implemented with an if in the callback, so it's essentially sugar. However, not descending into certain subtrees is much harder to implement in the callback and can simplify logic quite a lot for some use cases where you know you can exclude entire subtrees upfront (e.g. you're only interested in function callees), so I’d be inclined to add something for that. However, it can be much simpler than only and except, e.g. a function returning a boolean. We can then add more conveniences/granularity as use cases arise. What do you think?

Adam Janicki · Answer 8 · Thu Dec 07 2023 04:30:33 GMT+0800 (China Standard Time)

Ok I see. If we move forward with keeping those additional options, then we should rename them to be more clear, indicating that they skip entire subtrees, not just a single node.

but I cannot find any instances of it being used in the Mavo codebase (could you verify?)

I just checked, and there are no instances of ignore being used in Mavo. For Vastly, do we want to omit them for now, and then add them if we get enough concrete cases where ignoring subtrees is useful?

Perhaps instead we could have a parameter like o.ignoreSubtree, which has the form node => boolean, and returns true if a subtree should be ignored, and false otherwise? This solution at least gets rid of some of the ambiguity of the naming of the current o.ignore, as well as eliminating the issues that exist currently where ignore can be either a string or a function, and we have to use that matches function

@LeaVerou should we keep as is and then come back to this? Or should we address this in this PR

Adam Janicki · Answer 9 · Thu Dec 07 2023 22:20:57 GMT+0800 (China Standard Time)

@LeaVerou thoughts on doing any of the above in this PR? Or we can wait and think about it and then come back to this in another one