AST improvements
alexander-akait opened this issue · comments
Hello, thanks for great projects, I am from webpack organization. We want to improve integration with html, but faced some difficulties due to insufficient information in AST and would like to help improve this.
- Update
startIndex
andendIndex
ononattribute
Our concept is transform html to js:
file.html
<div>
<h1>Text</h1>
<img src="./image.png" alt="alt" />
</div>
Transform to:
import file from './image.png';
export default "<div><h1>Text</h1><img src=" + file + " alt="alt" /></div>"
But we lack some information - positions to replace them on imported content.
Now we use hack https://github.com/webpack-contrib/html-loader/blob/master/src/plugins/source-plugin.js#L436
{
onattribute(name, value) {
const endIndex = parser._tokenizer._index;
const startIndex = endIndex - value.length;
const unquoted = html[endIndex] !== '"' && html[endIndex] !== "'";
attributesMeta[name] = { startIndex, unquoted };
}
}
But it is very hacky and dirty solution. I think it will be very easy to fix, I can help with this.
- No information about quotes.
Developer can set any transformer for import file from './image.png';
and even change name of file, so we need specific logic to ensure should we keep quotes or insert them if filename with not allowed (for example space in name - image of something.png
) characters in name.
You can see our hack above. Will be great to add them and improve onattribute
like:
{
onattribute(name, value, quotes) {}
}
The quotes
argument can be:
undefined
- no quotes'
or"
- type of quotes
- Duplicate attributes
It is very edge case and i think it is breaking change.
For example: <img src="./image.png" src="./other-image.png" alt="alt" />
Now the parser returns to us:
{
attribs: {
src: "./image.png",
alt: "alt"
},
}
But onattribute
called twice as expected.
Will be great to improve it to:
{
attribs: [
{
src: "./image.png",
alt: "alt"
},
{
src: "./other-image.png",
alt: "alt"
}
];
}
Thank you again for the good project, and I will be happy for any feedback. And I’m ready to help with any of these problems, they do not seem complicated
Hi @evilebottnawi, very interesting use-case, thanks for providing some insights!
(1) should definitely be fixed. I haven't gone through the responsible code in a while, but adding some _updatePosition
calls to src/Parser.ts
should be a good start.
For (2): So far, I've been pretty adamant to not add any output that does not relate to the semantic meaning. Once (1) is fixed, this should be much easier to implement.
As you said, (3) is a pretty big breaking change. This is the exact use-case of per-attribute events, so generating this yourself is hopefully not too bad.
(1) should definitely be fixed. I haven't gone through the responsible code in a while, but adding some _updatePosition calls to src/Parser.ts should be a good start.
👍
For (2): So far, I've been pretty adamant to not add any output that does not relate to the semantic meaning. Once (1) is fixed, this should be much easier to implement.
I think quotation marks have a very important semantic meaning, by the way, you already have this information, why do not provide it to developer, it would make life easier. I found postcss-html
uses this package too and they use hacks for same, if one of the biggest consumers uses hacks, maybe it's really worth considering an improvement
As you said, (3) is a pretty big breaking change. This is the exact use-case of per-attribute events, so generating this yourself is hopefully not too bad.
Maybe we will postpone it until the next major release, I find this a little unfortunate decision. I can even imagine how a developer is trying to create linter using this package and can't implement no-duplicate-attributes
rule 😄
I've also run into a use case where I think we can benefit from this (at least if cheerio
and dom-serializer
make use of it).
For my new project, integrity-matters, a tool to check hashes and auto-update HTML integrity
attributes and CDN version URLs (based on what is present in node_modules
), I'd like to keep inter-attribute whitespace in place, e.g., to have:
<script src="https://unpkg.com/leaflet@1.4.0/dist/leaflet.js"
integrity="sha512-QVftwZFqvtRNi0ZyCtsznlKSWOStnDORoefr1enyq5mVL4tmKB3S/EnC3rRJcxCPavG10IcrVGSmPh6Qw5lwrg=="
crossorigin=""></script>
...not be overwritten after an update into a one-liner like:
<script src="https://unpkg.com/leaflet@1.6.0/dist/leaflet.js" integrity="sha512-gZwIG9x3wUXg2hdXF6+rVkLF/0Vi9U8D2Ntg4Ga5I5BZpVkVxlJWbSQtXPSiUTtC0TjtGOmxa1AJPuV0CPthew==" crossorigin></script>
Finally getting around to addressing this. The one thing that definitely can be added is information about quotes. We actually have four states here:
- Single quotes (
foo='bar'
) - Double quotes (
foo="bar"
) - No quotes around the value (
foo=bar
) - No value (
foo
)
Should they be handled separately?
Adding an array of attributes is not something I want to add here, as it would always be a breaking change. To add this to the existing DOM could be done easily outside of this module. Something like this should do the job:
class DomWithAttributeArrayHandler extends DomHandler {
_attributes = [];
onattribute(name, value, quote) {
this._attributes.push([name, value, quote]);
}
onopentag(name, attribs) {
super.onopentag(name, attribs);
this._tagStack[
this._tagStack.length - 1
].attributeList = this._attributes;
this._attributes = [];
}
}
Finally, adding location information to attributes is also a bit tricky, as onopentag
is emitted after all of the attributes. The start of the section would actually have to track back, which is not something that is supported right now. Happy to accept PRs for a more wholistic solution here, I am struggling to come up with a good solution.
- Single quotes (foo='bar')
- Double quotes (foo="bar")
- No quotes around the value (foo=bar)
- No value (foo)
Maybe?
quote
-"'"
quote
-'"'
quote
-null
quote
-undefined
Adding an array of attributes is not something I want to add here, as it would always be a breaking change. To add this to the existing DOM could be done easily outside of this module.
Maybe we can postpone it? Not high priority. Using this._attributes
for me was always unsafe, because it is look like private variables.
Finally, adding location information to attributes is also a bit tricky, as onopentag is emitted after all of the attributes. The start of the section would actually have to track back, which is not something that is supported right now. Happy to accept PRs for a more wholistic solution here, I am struggling to come up with a good solution.
I'll try to look at it soon, maybe I can find a good solution. But without this information it is very difficult to use package for future generations, we use this hack https://github.com/webpack-contrib/html-loader/blob/master/src/plugins/source-plugin.js#L46, maybe it can help
I pushed a change that adds quotes to the onattribute
event.
Using this._attributes for me was always unsafe, because it is look like private variables.
this._attributes
doesn't actually exist on DomHandler instances, this would be a private property for the extended class 🤷
@fb55 Just interesting, do we have the quote
property for the attributes
argument in onopentag(tag, attributes)
callback? Will be great to have this property in all places where we have access to attribute
do we have the quote property for the attributes argument in
onopentag(tag, attributes)
callback
onopentag
is just a thin layer over onopentagname
, onattribute
and onopentagend
and should be pretty easy to replicate in user land. I'd prefer not to make any changes that either break the existing API or allocate memory that won't be used by a large set of users.
@fb55 we can always introduce option(s) for this, if you're worried about performance, I don't need this information in some cases, in other cases it would be very useful/required, same for locations
(startIndex
/endIndex
), I understand perfectly that I sacrifice performance because I need more information, otherwise I have to look for dirty solutions which is not very good
A key question for me, @fb55 , as far as making a PR to add location info (optionally or otherwise), is whether you'd accept changes to dom-serializer
and DomHandler
to take advantage of the feature. If so, I can see about a PR, as I have energy, as that is my real interest (I figured you'd want it solved at the source here, however, rather than hacked into DOMHandler
or wherever the hack could be applied in those projects). I might also see about passing on the attribute quotes if that API is ready. Thanks!
To give a definite answer here: I don't think the existing DOM structure can support these use-cases. It seems like there might be enough interest to create a separate handler, perhaps as a fork of the existing one. Happy to promote something for that use-case.
I'm closing this ticket as I don't have a good way forward in the existing project.