posthtml / posthtml-parser

Parse HTML/XML to PostHTMLTree

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AST change: introducing a `parent` property

RomFouq opened this issue · comments

First of all, thank you for PostHTML and its companion projects!

I am currently working on a plugin that needs frequent access to parent nodes.
I would like to propose a couple of changes to the AST generated by https://github.com/posthtml/posthtml-parser, to make it easier to use and faster for some use cases.

Introducing a parent property to the Node type, referencing the parent node, would for instance make posthtml/posthtml#226 more straightforward to solve.
Sometimes matching the parent node then handling the child is not possible, e.g. when the child may or may not have a parent node, forcing to handle the case where the child is an orphan separately.

Of course it is possible to insert this property in each node after the tree has been generated by the parser, but this requires traversing the whole tree, which can have a significant impact on performance for big trees.

To introduce this property, it would be needed to change the way HTML text nodes are encoded in the AST, as it is not possible to append an additional property to JS string objects.
HTML text nodes could for instance be encoded in the form { text: string | number, parent?: NodeTag | Node[] }, unfortunately breaking backward compatibility for the parser (but https://github.com/posthtml/posthtml-render could still be backward compatible) but I think this is inevitable.

I have working code for both https://github.com/posthtml/posthtml-parser and https://github.com/posthtml/posthtml-render with updated type definitions and passing unit tests and can make a couple of PRs if you would want to take a look.

@RomFouq Hi, thank you for your interest. I propose to proceed as follows:

  1. Create a plugin that implements the required functionality.

Of course it is possible to insert this property in each node after the tree has been generated by the parser, but this requires traversing the whole tree, which can have a significant impact on performance for big trees.

  1. Include this plugin to the core system and release the alpha version.
  2. Create a PR in the alpha version that replaces the plugin for the implementation of the feature.

I just made a PR introducing the parent property, updating tests and the documentation.
I am not sure why you wanted to implement it as a plugin first; the opposite plugin (restoring the old AST format for plugins down the plugin chain) could ease the transition for existing plugins though.

I just made a PR introducing the parent property, updating tests and the documentation.

Thank you good work, I will consider your proposal. most likely it will initially go as an alpha version

I am not sure why you wanted to implement it as a plugin first; the opposite plugin (restoring the old AST format for plugins down the plugin chain) could ease the transition for existing plugins though.

So that integration and migration would not be a surprise for other users

the main disadvantage of this approach is described here posthtml/posthtml#226 (comment)

@RomFouq do you have motivation why it is important and necessary?

Now it seems that this is a special case that can be implemented with a regular api

Having access to a parent property makes it possible to traverse the tree upwards, starting at a given node and going up one node at a time.

This is way less computationally expensive when searching for a node that must be an ancestor of a given node, for instance to implement a function similar to Element.closest. Without it, traversing the whole tree is needed, instead of just the part between the given node and the ancestor (or the root node if the ancestor is not found).

The technical risks and opportunities are clear. Are there any cases where you can't do without it? or cases that show that it is beneficial?

Now it looks like a feature that will not be particularly in demand.