voku / HtmlMin

:clamp: HtmlMin: HTML Compressor and Minifier via PHP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Incorrect processing of <script type="text/html">

roadster31 opened this issue · comments

Hello,

I often use the construct <script id="some-id" type="text/html"> some HTML code </script> to inject HTML code in the DOM. The HTML code between <script> and </script> is incorrectly processed by HtmlMin.

What is this feature about (expected vs actual behaviour)?

Source code :

<!doctype html>
<html lang="fr">
<head>
    <title>Test</title>
</head>
<body>
    A Body

    <script id="elements-image-1" type="text/html">
        <div class="place badge-carte">Place du Village<br>250m - 2mn à pied</div>
        <div class="telecabine badge-carte">Télécabine du Chamois<br>250m - 2mn à pied</div>
        <div class="situation badge-carte"><img src="https://domain.tld/assets/frontOffice/kneiss/template-assets/assets/dist/img/08ecd8a.png" alt=""></div>
    </script>
</body>
</html>

Expected behaviour :

<!DOCTYPE html><html lang="fr"><head><title>Test</title></head><body>A Body<script id="elements-image-1" type="text/html">
        <div class="place badge-carte">Place du Village<br>250m - 2mn à pied</div>
        <div class="telecabine badge-carte">Télécabine du Chamois<br>250m - 2mn à pied</div>
        <div class="situation badge-carte"><img src="https://domain.tld/assets/frontOffice/kneiss/template-assets/assets/dist/img/08ecd8a.png" alt=""></div>
    </script></body></html>

Actual behaviour :

<!DOCTYPE html><html lang="fr"><head><title>Test</title></head><body>A Body<script id="elements-image-1" type="text/html">
        <div class="place badge-carte">Place du Village<br>250m - 2mn à pied
        <div class="telecabine badge-carte">Télécabine du Chamois<br>250m - 2mn à pied
        <div class="situation badge-carte"><img src="https://domain.tld/assets/frontOffice/kneiss/template-assets/assets/dist/img/08ecd8a.png" alt="">
    </script></body></html>

How can I reproduce it?

Use the above source code.

Does it take minutes, hours or days to fix?

Not sure about that. Maybe minutes to ignore <script type="text/html"> content ?

Any additional information?

Thanks for your work :)

After a few tests, it seems that DOMDocument::loadHTML() is the root cause of this problem. Loading the test document and saving it immediately gives the following result, where </div> are missing :

<!DOCTYPE html>
<?xml encoding="UTF-8" ?><html lang="fr"><head><title>Test</title></head><body>
    A Body

    <script id="elements-image-1" type="text/html">
        <div class="place badge-carte">Place du Village<br>250m - 2mn &agrave; pied
        <div class="telecabine badge-carte">T&eacute;l&eacute;cabine du Chamois<br>250m - 2mn &agrave; pied
        <div class="situation badge-carte"><img src="https://domain.tld/assets/frontOffice/kneiss/template-assets/assets/dist/img/08ecd8a.png" alt="">
    </script></body></html>

I'll investigate and get back to you if I find something interesting about that.

After digging in StackOverflow, it seems that the only possible solution is parsing the HTML as XML, after processing self-closing tags to provide a valid XML document to the XML loader :

https://stackoverflow.com/questions/19788017/how-to-combine-phps-domdocument-with-a-javascript-template

fixed in version 3.1.3