Cannot use protocol-relative URL in script src attribute

Question

Cannot use protocol-relative URL in script src attribute

ronosm opened this issue 2 years ago · comments

It's not possible to add a protocol-relative URL in the src attribute from script tag.

At naughtyHref function it is handled, but after that this code is executed:

const parsed = new URL(value);

This code generates an exception when a protocol-relative URL is used, generating an exception making allowed value as false:

} catch (e) {
allowed = false;
}

Finally, it makes the src attribute is removed from script tag.

This happens between lines 325 and 355.

Georgi Marinov · Answer 1 · Sun Feb 20 2022 23:15:54 GMT+0800 (China Standard Time)

+1

here:

https://github.com/apostrophecms/sanitize-html/blob/3cdc262/index.js#L333-L338

there's some discussion over "protocol-relative" URLs and apparently they're considered an anti-pattern.
but surely referencing an https:// resource from http:// origin creates CORS issues.
these URLs are supported in the browser and legacy HTML content is full of them

while new URL('//my.url') breaks without protocol, i don't think it should be up to sanitize-html to enforce the scheme of URLs when no allowedScriptHostnames is defined?

i.e. const parsed = new URL(value); should move inside the if block?

https://github.com/apostrophecms/sanitize-html/blob/3cdc262/index.js#L340-L348

edit: that probably applies to iframe src too

tag @yorickgirard

Tom Boutell · Answer 2 · Tue Feb 22 2022 21:06:09 GMT+0800 (China Standard Time)

Is this happening in the browser only? I believe the WHATWG URL parser in nodejs supports it.

A small subclass of the URL class to work around this issue probably wouldn't be difficult to contribute as a PR. The idea being to stub in the https: protocol but then stub it out again in toString if it was stubbed in.

Tom Boutell · Answer 3 · Tue Feb 22 2022 21:06:21 GMT+0800 (China Standard Time)

(As a PR on this module that is. Modifying URL upstream is unrealistic of course.)

Georgi Marinov · Answer 4 · Tue Feb 22 2022 22:17:22 GMT+0800 (China Standard Time)

@boutell , I guess what I'm saying is that in the absence of explicit allow-list of domains, it shouldn't even try to police URLs.

$ node
Welcome to Node.js v16.14.0.
Type ".help" for more information.

> new URL('//google.com')
Uncaught TypeError [ERR_INVALID_URL]: Invalid URL
    at __node_internal_captureLargerStackTrace (node:internal/errors:464:5)
    at new NodeError (node:internal/errors:371:5)
    at onParseError (node:internal/url:552:9)
    at new URL (node:internal/url:628:5) {
  input: '//google.com',
  code: 'ERR_INVALID_URL'
}

> new URL('ftp://google.com');
URL {
  href: 'ftp://google.com/',
  origin: 'ftp://google.com',
  protocol: 'ftp:',
  username: '',
  password: '',
  host: 'google.com',
  hostname: 'google.com',
  port: '',
  pathname: '/',
  search: '',
  searchParams: URLSearchParams {},
  hash: ''
}
>

Tom Boutell · Answer 5 · Tue Feb 22 2022 22:56:07 GMT+0800 (China Standard Time)

I don't think it would be a good idea to accept invalid URLs, but I agree that protocol relative URLs should not be considered invalid, at least by default, at least not yet. This is why I'm suggesting using a subclass wrapper for URL that accepts that particular case without reimplementing URL.

Tom Boutell · Answer 6 · Tue Feb 22 2022 22:58:13 GMT+0800 (China Standard Time)

(The lack of support for protocol relative URLs in Node 16 does increase the priority on our end but it's not something we can immediately schedule here as we don't have a client requirement for it right now.)

…

On Tue, Feb 22, 2022 at 9:17 AM Georgi Marinov ***@***.***> wrote: @boutell <https://github.com/boutell> , I guess what I'm saying is that in the absence of explicit allow-list of domains, it shouldn't even try to police URLs. $ node Welcome to Node.js v16.14.0. Type ".help" for more information. > new URL('//google.com') Uncaught TypeError [ERR_INVALID_URL]: Invalid URL at __node_internal_captureLargerStackTrace (node:internal/errors:464:5) at new NodeError (node:internal/errors:371:5) at onParseError (node:internal/url:552:9) at new URL (node:internal/url:628:5) { input: '//google.com', code: 'ERR_INVALID_URL' } > new URL('ftp://google.com'); URL { href: 'ftp://google.com/', origin: 'ftp://google.com', protocol: 'ftp:', username: '', password: '', host: 'google.com', hostname: 'google.com', port: '', pathname: '/', search: '', searchParams: URLSearchParams {}, hash: '' } > — Reply to this email directly, view it on GitHub <#531 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAH27P7462HKMFC27FIE4TU4OLH5ANCNFSM5M57AQYA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

-- THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his

Georgi Marinov · Answer 7 · Tue Feb 22 2022 23:58:33 GMT+0800 (China Standard Time)

it's just that till a few updates ago, sanitize-html didn't have this issue, right?

and in the browser, where the sanitized html ends up, protocol-relative URLs are not invalid.

Tom Boutell · Answer 8 · Wed Feb 23 2022 00:10:39 GMT+0800 (China Standard Time)

src has always been passed through naughtyHref no matter what the tag is. I think what we're seeing is that Node 16 now has the same strict policy on protocol relative URLs that is enforced by Safari. A more tolerant subclass of Url would resolve it.