Protocol-relative (no `http(s):`) URL issue: Script cache issue and anti-pattern consideration
riccardoporreca opened this issue · comments
HTMLProofer supports protocol-relative URLs, which do starts with //
but have no protocol:
html-proofer/lib/html_proofer/attribute/url.rb
Lines 24 to 25 in 976644f
There is currently a small issue with the cache for protocol-relative Script src
, which can be reproduced using the following protocol-relative.html
<!DOCTYPE html>
<html>
<body>
<script src="//maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js"></script>
<img alt="An existing image" src="//upload.wikimedia.org/wikipedia/en/thumb/2/22/Heckert_GNU_white.svg/256px-Heckert_GNU_white.svg.png" /> </p>
<a href="//github.com/octocat/Spoon-Knife/issues">An HTTPS link!</a></p>
<meta property="og:image" content="//github.com/favicon.ico" />
<link rel="icon" class="js-site-favicon" type="image/svg+xml" href="//github.githubassets.com/favicons/favicon.svg"></body>
</html>
First run populating the cache
bundle exec htmlproofer protocol-relative.html --log-level debug --cache '{"timeframe": {"external": "1d"}}' --checks Links,Images,Scripts,OpenGraph,Favicon
Second run
bundle exec htmlproofer protocol-relative.html --log-level debug --cache '{"timeframe": {"external": "1d"}}' --checks Links,Images,Scripts,OpenGraph,Favicon
# Found 5 external links in the cache
# Removing https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js from external cache (not detected anymore)
# Removing 1 outdated external link from the cache
# Adding //maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js to external cache
This can be easily fixed by using @script.url
in
However, protocol-relative URLs are nowadays considered an anti-pattern, see e.g. https://github.com/konklone/cdns-to-https/blob/c455ec73817abff4946621402cf16f4de524d22d/README.md#background, https://www.designcise.com/web/tutorial/why-protocol-relative-url-are-no-longer-relevant, https://www.holisticseo.digital/technical-seo/web-security/protocol-relative-url.
Given this, HTMLProofer could, instead of converting to https:// internally
, detect them as error and report something like "script link //maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js is a protocol-relative URL, use explicit https:// instead".
@gjtorikian, I have prepared both the small fix if we would still allow protocol-relative URLs
and the detection as failures:
In the latter case running for the protocol-relative.html
above yields:
For the Favicon check, the following failures were found:
* At protocol-relative.html:8:
favicon link //github.githubassets.com/favicons/favicon.svg is a protocol-relative URL, use explict https:// instead
For the Images check, the following failures were found:
* At protocol-relative.html:5:
image link //upload.wikimedia.org/wikipedia/en/thumb/2/22/Heckert_GNU_white.svg/256px-Heckert_GNU_white.svg.png is a protocol-relative URL, use explict https:// instead
For the Links check, the following failures were found:
* At protocol-relative.html:6:
//github.com/octocat/Spoon-Knife/issues is a protocol-relative URL, use explict https:// instead
* At protocol-relative.html:8:
//github.githubassets.com/favicons/favicon.svg is a protocol-relative URL, use explict https:// instead
For the OpenGraph check, the following failures were found:
* At protocol-relative.html:7:
open graph link //github.com/favicon.ico is a protocol-relative URL, use explict https:// instead
For the Scripts check, the following failures were found:
* At protocol-relative.html:4:
script link //maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js is a protocol-relative URL, use explict https:// instead
Happy to create a PR for whichever of the two alternatives you deem relevant.
Yes, I think reporting as an error now makes sense, rather than silently accepting.