mojolicious / mojo

:sparkles: Mojolicious - Perl real-time web framework

Home Page:https://mojolicious.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mojo::DOM misparses <script> elements (another way)

mauke opened this issue · comments

commented
  • Mojolicious version: 9.30
  • Perl version: v5.36.0
  • Operating system: Ubuntu 22.04.1 LTS

Steps to reproduce the behavior

#!/usr/bin/env perl
use v5.12.0;
use warnings;
use Mojo::DOM;

my $dom = Mojo::DOM->new(do { local $/; scalar readline DATA });

say for $dom->find('p')->each;

__DATA__
<!DOCTYPE html>
<h1>Welcome to HTML</h1>
<script>
    console.log('this is a script element and should be executed');
// </script asdf> <p>
    console.log('this is not a script');
    // <span data-wtf="</script>">:-)</span>

Expected behavior

Output similar to:

<p>
    console.log(&#39;this is not a script&#39;);
    // <span data-wtf="&lt;/script&gt;">:-)</span>
</p>

An (implicitly closed) p element exists, so it should be found.

Actual behavior

No output.

I've not looked at the spec yet, but this would probably be the section to check for the correct behavior.

commented

The relevant section is this one: https://html.spec.whatwg.org/multipage/parsing.html#script-data-end-tag-name-state

After seeing </ (followed by a letter) in a <script> element, we end up in the "script data end tag name" state. Here we accumulate letters into the name of a temporary tag. On seeing whitespace (space, tab, line feed, form feed), we check that the temporary tag name matches "script"; if so, we stop script parsing (treating the characters found as a script end tag) and continue parsing for attributes.

Now, end tags with attributes are technically an error: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-end-tag-with-attributes
But a forgiving parser will simply ignore them.