tdewolff / minify

Go minifiers for web formats

Home Page:https://go.tacodewolff.nl/minify

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parser issues around MathML in HTML tag transforms

keller-tophat opened this issue · comments

Summary

Minifying <math> elements inside an HTML document can result in unexpected attribute and whitespace transforms. It seems like it's trying to parse them as XML applying incorrect whitespace and attribute quoting expectations.

Version

# minify --version
minify v2.12.9-12-g76935f3

Example

echo 'foo <math display=inline>hello</math> world' | minify --type html 

Expected:

foo <math display=inline>hello</math> world

Actual:

foo <math display="nlin">hello</math>world

Thanks for raising the issue. It is my understanding that <math> and <svg> are XML that can be embedded in HTML, including the tag itself. Thus, using attributes without proper quoting would be an error. But perhaps the tag itself is still HTML and the content is XML? Or is it all HTML? The specification is a bit vague here, or at least the W3C validator doesn't seem to make a difference.

Maybe the XML minifier should leave invalid attribute values as-is, which would fix this issue.

I've pushed out a change in the XML minifier.

I think there's still an issue around whitespace after a closing math tag. For example, the HTML

a <math display="inline">b</math> c

is minified to

a <math display=inline>b</math>c

losing the whitespace afterwards.

Adding omitSpace = false to the MathToken case in html.go fixes this for me and passes all the tests, but I'm not sure if this is the correct approach. It also doesn't account for <math display="block">, in which case I think removing the whitespace is fine.

You're right, should be fixed now.