nikic / PHP-Parser

A PHP parser written in PHP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Planned changes to NodeTraverser

nikic opened this issue · comments

For PHP-Parser 5 I plan to make the following changes to NodeTraverser. The new interface will be

// Or maybe ConcurrentVisitor.
class InterleavedVisitor implements NodeVisitor {
    /** @param NodeVisitor[] $visitors */
    public function __construct(array $visitors) { ... }
    // ...
}

interface NodeTraverserInterface {
    /**
     * @param Node[] $nodes
     * @return Nodes[]
     */
    public function traverse(array $stmts, NodeVisitor $visitor): array;
}


interface NodeTraverser implements NodeTraverserInterface {
    /**
     * @param Node[] $nodes
     * @return Nodes[]
     */
    public function traverse(array $stmts, ?NodeVisitor $visitor = null): array;

    // For backwards-compatibility only.
    public function addVisitor(NodeVisitor $visitor): void { ... }
    public function removeVisitor(NodeVisitor $visitor): void { ... }
}

The usage will be as follows:

$traverser = new NodeTraverser();

// To run two visitors sequentially, one finishing completely before the other starts.
$stmts = $traverser->traverse($stmts, new CloningVisitor());
$stmts = $traverser->traverse($stmts, new NameResolver());

// To run two visitors concurrently (invoking all visitors for each node before visiting the next).
$stmts = $traverser->traverse($stmts, new InterleavedVisitor([
    new CloningVisitor(),
    new NameResolver(),
]));

// For backwards-compatibility, the following will also continue to work:
// This will construct the InterleavedVisitor internally, and do a concurrent visit.
$traverser->addVisitor(new CloningVisitor());
$traverser->addVisitor(new NameResolver());
$stmts = $traverser->traverse($stmts);

The motivation for this change is:

  1. Make the distinction between a sequential and concurrent visitor very clear. I got the impression that this is a common gotcha, because it's not clear from the NodeTraverser interface that visitors are concurrent, and you need to be careful about dependencies between visitors.
  2. Avoid the need to construct a separate NodeTraverser per visitor if you want to do sequential visits. Sequential and concurrent visitation should be equally simple.

A problem I ran into here is that the current NodeVisitor interface isn't flexible enough to handle all possible cases here. For example, let's say you have one visitor that return a new node, and then a second one that returns DONT_TRAVERSE_CHILDREN. If we combine these into one InterleavedVisitor, there is no way to both return a new node and DONT_TRAVERSE_CHILDREN.

I don't think it is this library's job to reconcile visitors with conflicting ideas about what to do. Tree rewriting is Turing complete. If one visitor can modify the AST in a way that violates another visitor's assumptions or invalidates the results of its work, then it is on the application to model and implement the right reconciliation protocol and backtracking and whatnot.

Now that visitors are invoked last-enter-first-leave, we can think of them as middlewares, with the outer middleware having the power to decide whether to invoke "previous" middlewares or make a "final decision" like DONT_TRAVERSE_CHILDREN. It is the application's job to make sure that the visitors in an InterleavedVisitor are ordered correctly according to that power.

Btw. in my experience splitting complex refactorings into separate passes was almost always the simplest and least error prone approach; even doing it break-and-repair style, e.g. add attributes first, then insert missing use statements for the attribute classes in a repair pass. Interleaved visiting is mostly useful for combining several non-conflicting, information-gathering visitors in one pass for better performance, or for combining "middleware" visitors that implement cross-pass protocols with a single visitor that does the actual work for the pass.

I don't think it is this library's job to reconcile visitors with conflicting ideas about what to do. Tree rewriting is Turing complete. If one visitor can modify the AST in a way that violates another visitor's assumptions or invalidates the results of its work, then it is on the application to model and implement the right reconciliation protocol and backtracking and whatnot.

So you're suggesting that if in the InterleavedVisitor, the first one returns a new node, and the second one return DONT_TRAVERSE_CHILDREN (or similar), that would be considered user error and the InterleavedVisitor should throw in that case?

So you're suggesting that if in the InterleavedVisitor, the first one returns a new node, and the second one return DONT_TRAVERSE_CHILDREN (or similar), that would be considered user error and the InterleavedVisitor should throw in that case?

No, I am suggesting that InterleavedVisitor cannot possibly know if this is a user error or not; I can imagine situations in which it is, and situations in which it isn't. Therefore it isn't InterleavedVisitor's job to assert any idea about correctness here; the only thing to do is to either specify what InterleavedVisitor will do in this case, i.e. turn it into defined behavior, or (possibly better, but possibly a rabbit hole) allow the application to specify the policy.

So you're suggesting that if in the InterleavedVisitor, the first one returns a new node, and the second one return DONT_TRAVERSE_CHILDREN (or similar), that would be considered user error and the InterleavedVisitor should throw in that case?

No, I am suggesting that InterleavedVisitor cannot possibly know if this is a user error or not; I can imagine situations in which it is, and situations in which it isn't. Therefore it isn't InterleavedVisitor's job to assert any idea about correctness here; the only thing to do is to either specify what InterleavedVisitor will do in this case, i.e. turn it into defined behavior

I don't think that's possible, because both defined behaviors (ignore one or the other action), are obviously wrong. We can either make this user error, or support it by performing both actions (which requires extending the NodeVisitor API to allow doing so).

I disagree. There is a well-defined timeline of who and when gets an opportunity to mutate the AST and/or affect traversal: outer enter(), inner enter(), inner leave(), outer leave(). You inform application writers that this is how the timeline is structured and there are no built-in facilities for time travel. If they want to implement time travel on top of your library, then it's entirely on them to avoid paradoxes, resolve paradoxes, and/or choose to rely on the timeline to resolve paradoxes, on a case by case basis.

[xkcd://guy_exits_time_machine_shoots_grandfather_and_says_DONT_TRAVERSE_CHILDREN.gif]

Two visitors that genuinely disagree about what to do never belong in a single InterleavedVisitor, full stop, no disagreement here. However if two visitors are in a relationship that goes like "I say X, but if that specific guy over there whom the engine will query later says Y, then that's fine", then they don't genuinely disagree, they in fact cooperate over the simple default protocol.

There is no time travel involved. The actions are sequential, not overriding. If one visitor returns a new node, that's the node the next visitor is going to see, which can in turn return a new node, etc. Similarly, if one visitor returns a new node, the next visitor can decide that it would like to stop further traversal. But that changes nothing about the node replacement that has already happened.

At least this is how things currently work. The action returned by a later visitor does not discard that of an earlier ones, they are both performed.

Ah, I think I get it: the problem is that there are two orthogonal "dimensions" to the possible response from a visitor: whether to replace or keep the node, and whether to alter the traversal or not. Currently there is no way to say "replace, skip children" as it was unlikely to be needed in the case of single-task visitors, but an InterleavedVisitor combines multiple single-task visitors and there is no way to combine two orthogonal responses, which should combine (as opposed to non-orthogonal ones which should override sequentially).

Since enterNode() / leaveNode() already return mixed (or will it be a union type in 5.0?), you could also allow returning a VisitorResponse object consisting of an optional node directive and an optional traversal directive. You would use this type internally, normalizing visitors' responses to it, and also accumulating them in InterleavedVisitor. (This assumes that you don't modify the AST within the loop over visitors in InterleavedVisitor, and "next visitor sees the new node" is achieved by updating a local variable in the loop. The node replacement in AST is performed by the traverser after the call to InterleavedVisitor returns, just like with any other visitor.)

I think I'm going to drop this change. I thought this design would make things simpler overall, but in practice this turns out to be a significant complication.