mojolicious / mojo

:sparkles: Mojolicious - Perl real-time web framework

Home Page:https://mojolicious.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mojo::DOM misparses <script> elements

mauke opened this issue · comments

commented
  • Mojolicious version: 9.30
  • Perl version: v5.36.0
  • Operating system: Ubuntu 22.04.1 LTS

Steps to reproduce the behavior

#!/usr/bin/env perl
use v5.12.0;
use warnings;
use Mojo::DOM;

my $dom = Mojo::DOM->new(do { local $/; scalar readline DATA });

say for $dom->find('div')->each;

__DATA__
<!DOCTYPE html>
<h1>Welcome to HTML</h1>
<script>
    console.log('< /script> is safe');
    /* <div>XXX this is not a div element</div> */
</script>

Expected behavior

No output as the document contains no div elements. (document.querySelectorAll('div') in a browser agrees.)

Actual behavior

Output:

<div>XXX this is not a div element</div>

I've not looked at the spec yet, but this would probably be the section to check for the correct behavior.

commented

This one looks relevant: https://html.spec.whatwg.org/multipage/parsing.html#script-data-less-than-sign-state

After seeing a < in a <script> element, the parser looks at the next character. Only ! and / are special. For any other character (including space), the < is parsed literally and scanning continues.

This line probably needs some fixing.

xmllint appears to recognize the <script> block all the way to the final closing </script> (though it seems to have issues with comments):

$ xmllint --html --debug mojo-issue-2014.html 
mojo-issue-2014.html:5: HTML parser error : Unexpected end tag : div
/* <div>XXX this is not a div element</div> */
                                           ^
HTML DOCUMENT
URL=mojo-issue-2014.html
standalone=true
  DTD(html)
  ELEMENT html
    ELEMENT body
      ELEMENT h1
        TEXT
          content=Welcome to HTML
      TEXT
        content= 
      ELEMENT script
        CDATA_SECTION
          content=     console.log('< /script> is safe'); ...`

$ xmllint --html --xpath //div mojo-issue-2014.html
mojo-issue-2014.html:5: HTML parser error : Unexpected end tag : div
    /* <div>XXX this is not a div element</div> */
                                               ^
XPath set is empty

$ xmllint --html --xpath //script mojo-issue-2014.html
mojo-issue-2014.html:5: HTML parser error : Unexpected end tag : div
    /* <div>XXX this is not a div element</div> */
                                               ^
<script><![CDATA[
    console.log('< /script> is safe');
    /* <div>XXX this is not a div element */
]]></script>

Also fixed in @mojojs/dom. mojolicious/dom.js@90ad748