Mojo::DOM misparses <script> elements (another way)
mauke opened this issue · comments
- Mojolicious version: 9.30
- Perl version: v5.36.0
- Operating system: Ubuntu 22.04.1 LTS
Steps to reproduce the behavior
#!/usr/bin/env perl
use v5.12.0;
use warnings;
use Mojo::DOM;
my $dom = Mojo::DOM->new(do { local $/; scalar readline DATA });
say for $dom->find('p')->each;
__DATA__
<!DOCTYPE html>
<h1>Welcome to HTML</h1>
<script>
console.log('this is a script element and should be executed');
// </script asdf> <p>
console.log('this is not a script');
// <span data-wtf="</script>">:-)</span>
Expected behavior
Output similar to:
<p>
console.log('this is not a script');
// <span data-wtf="</script>">:-)</span>
</p>
An (implicitly closed) p
element exists, so it should be found.
Actual behavior
No output.
I've not looked at the spec yet, but this would probably be the section to check for the correct behavior.
The relevant section is this one: https://html.spec.whatwg.org/multipage/parsing.html#script-data-end-tag-name-state
After seeing </
(followed by a letter) in a <script>
element, we end up in the "script data end tag name" state. Here we accumulate letters into the name of a temporary tag. On seeing whitespace (space, tab, line feed, form feed), we check that the temporary tag name matches "script"; if so, we stop script parsing (treating the characters found as a script
end tag) and continue parsing for attributes.
Now, end tags with attributes are technically an error: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-end-tag-with-attributes
But a forgiving parser will simply ignore them.