Parser issues around MathML in HTML tag transforms
keller-tophat opened this issue · comments
Summary
Minifying <math>
elements inside an HTML document can result in unexpected attribute and whitespace transforms. It seems like it's trying to parse them as XML applying incorrect whitespace and attribute quoting expectations.
Version
# minify --version
minify v2.12.9-12-g76935f3
Example
echo 'foo <math display=inline>hello</math> world' | minify --type html
Expected:
foo <math display=inline>hello</math> world
Actual:
foo <math display="nlin">hello</math>world
Thanks for raising the issue. It is my understanding that <math>
and <svg>
are XML that can be embedded in HTML, including the tag itself. Thus, using attributes without proper quoting would be an error. But perhaps the tag itself is still HTML and the content is XML? Or is it all HTML? The specification is a bit vague here, or at least the W3C validator doesn't seem to make a difference.
Maybe the XML minifier should leave invalid attribute values as-is, which would fix this issue.
I've pushed out a change in the XML minifier.
I think there's still an issue around whitespace after a closing math tag. For example, the HTML
a <math display="inline">b</math> c
is minified to
a <math display=inline>b</math>c
losing the whitespace afterwards.
Adding omitSpace = false
to the MathToken
case in html.go
fixes this for me and passes all the tests, but I'm not sure if this is the correct approach. It also doesn't account for <math display="block">
, in which case I think removing the whitespace is fine.
You're right, should be fixed now.