Cannot use protocol-relative URL in script src attribute
ronosm opened this issue · comments
It's not possible to add a protocol-relative URL in the src attribute from script tag.
At naughtyHref function it is handled, but after that this code is executed:
const parsed = new URL(value);
This code generates an exception when a protocol-relative URL is used, generating an exception making allowed value as false:
} catch (e) {
allowed = false;
}
Finally, it makes the src attribute is removed from script tag.
This happens between lines 325 and 355.
+1
here:
https://github.com/apostrophecms/sanitize-html/blob/3cdc262/index.js#L333-L338
- there's some discussion over "protocol-relative" URLs and apparently they're considered an anti-pattern.
- but surely referencing an
https://
resource fromhttp://
origin creates CORS issues. - these URLs are supported in the browser and legacy HTML content is full of them
while new URL('//my.url')
breaks without protocol, i don't think it should be up to sanitize-html
to enforce the scheme of URLs when no allowedScriptHostnames
is defined?
i.e. const parsed = new URL(value);
should move inside the if
block?
https://github.com/apostrophecms/sanitize-html/blob/3cdc262/index.js#L340-L348
edit: that probably applies to iframe src
too
tag @yorickgirard
Is this happening in the browser only? I believe the WHATWG URL parser in nodejs supports it.
A small subclass of the URL class to work around this issue probably wouldn't be difficult to contribute as a PR. The idea being to stub in the https: protocol but then stub it out again in toString if it was stubbed in.
(As a PR on this module that is. Modifying URL upstream is unrealistic of course.)
@boutell , I guess what I'm saying is that in the absence of explicit allow-list of domains, it shouldn't even try to police URLs.
$ node
Welcome to Node.js v16.14.0.
Type ".help" for more information.
> new URL('//google.com')
Uncaught TypeError [ERR_INVALID_URL]: Invalid URL
at __node_internal_captureLargerStackTrace (node:internal/errors:464:5)
at new NodeError (node:internal/errors:371:5)
at onParseError (node:internal/url:552:9)
at new URL (node:internal/url:628:5) {
input: '//google.com',
code: 'ERR_INVALID_URL'
}
> new URL('ftp://google.com');
URL {
href: 'ftp://google.com/',
origin: 'ftp://google.com',
protocol: 'ftp:',
username: '',
password: '',
host: 'google.com',
hostname: 'google.com',
port: '',
pathname: '/',
search: '',
searchParams: URLSearchParams {},
hash: ''
}
>
I don't think it would be a good idea to accept invalid URLs, but I agree that protocol relative URLs should not be considered invalid, at least by default, at least not yet. This is why I'm suggesting using a subclass wrapper for URL
that accepts that particular case without reimplementing URL
.
it's just that till a few updates ago, sanitize-html didn't have this issue, right?
and in the browser, where the sanitized html ends up, protocol-relative URLs are not invalid.
src
has always been passed through naughtyHref
no matter what the tag is. I think what we're seeing is that Node 16 now has the same strict policy on protocol relative URLs that is enforced by Safari. A more tolerant subclass of Url would resolve it.