voku / HtmlMin

:clamp: HtmlMin: HTML Compressor and Minifier via PHP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Vanishing closing tag

ozupey opened this issue · comments

What is this feature about (expected vs actual behaviour)?

Some closing tags are vanishing. In my example, the </p> is suddenly gone.

string(94) "<div class=rating><p style="margin: 0;"><span style="width: 100%;"></span>  (2 reviews) </div>"

How can I reproduce it?

$html = '
<div class="rating">
    <p style="margin: 0;">
        <span style="width: 100%;"></span>
    </p>

    (2 reviews)
</div>
';

$htmlMin = new voku\helper\HtmlMin();
$result = $htmlMin->minify($html);

var_dump($result);

Does it take minutes, hours or days to fix?

Dunno.

Any additional information?

Using voku/html-min 4.3.0 and voku/simple_html_dom 4.7.14.

Thanks for the bug report, 👍 fixed in version 4.4.2.

Hi,
I wrote this test after the last commits. It fails.

public function testKeepPTagIfNeeded2()
    {
        $html = '
		<div>
			<p>
				<span>First Paragraph</span>
			</p>
			Loose Text
			<p>Another Paragraph</p>
		</div>
		';

        $htmlMin = new voku\helper\HtmlMin();
        $result = $htmlMin->minify($html);

        $expected = '<div><p><span>First Paragraph</span> </p> Loose Text <p>Another Paragraph</div>';

        static::assertSame($expected, $result);
    }

I'm not sure about the whitespaces in the $expected string. But I am sure about the missing </p> tag.

I think perhaps the closing </p> tag can only be omitted if the <p> node is immediately followed by another <p> node or if it's the last node. But I haven't thought this through. Perhaps there are other edge-cases.

@abuyoyo yep, and we can ignore some cases where only whitespace are in middle of nothing ... I could simplify the logic. What do you think? -> 67d39fb

Yeah. That looks good.
As for the whitespaces, I'm pretty sure a single non-breaking-space must be preserved on either side of the Loose Text if it was there in the original html.
" Loose Text " != "Loose Text"
However:
<p><span></span> </p> evaluates the same as <p><span></span></p> (the space before the </p> can go)
The same goes for <div><p>Text </div> which I believe evaluates to <div><p>Text</div>.
And the whitespace handling is obviously a separate (and minor) issue.
If I remember correctly, whitespace only needs to be preserved between <li> elements in case someone sets their css to display:inline in which case <li></li><li></li> != <li></li> <li></li>

Ooh.
I was half-wrong.
<div><p>Text </div> is NOT the same as <div><p>Text</div>

So, in fact

<div>
    <p>Text</p>
</div>

should minify to <div><p>Text</div> and NOT to <div><p>Text </div> (which is the current implementation).