gjtorikian / html-proofer

Test your rendered HTML files to make sure they're accurate.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Protocol-relative (no `http(s):`) URL issue: Script cache issue and anti-pattern consideration

riccardoporreca opened this issue · comments

HTMLProofer supports protocol-relative URLs, which do starts with // but have no protocol:

# convert "//" links to "https://"
@url.start_with?("//") ? @url = "https:#{@url}" : @url

There is currently a small issue with the cache for protocol-relative Script src, which can be reproduced using the following protocol-relative.html

<!DOCTYPE html>
<html>
  <body>
    <script src="//maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js"></script>
    <img alt="An existing image" src="//upload.wikimedia.org/wikipedia/en/thumb/2/22/Heckert_GNU_white.svg/256px-Heckert_GNU_white.svg.png" /> </p>
    <a href="//github.com/octocat/Spoon-Knife/issues">An HTTPS link!</a></p>
    <meta property="og:image" content="//github.com/favicon.ico" />
    <link rel="icon" class="js-site-favicon" type="image/svg+xml" href="//github.githubassets.com/favicons/favicon.svg"></body>
</html>

First run populating the cache

bundle exec htmlproofer protocol-relative.html --log-level debug --cache '{"timeframe": {"external": "1d"}}' --checks Links,Images,Scripts,OpenGraph,Favicon

Second run

bundle exec htmlproofer protocol-relative.html --log-level debug --cache '{"timeframe": {"external": "1d"}}' --checks Links,Images,Scripts,OpenGraph,Favicon
# Found 5 external links in the cache
# Removing https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js from external cache (not detected anymore)
# Removing 1 outdated external link from the cache
# Adding //maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js to external cache

This can be easily fixed by using @script.url in

add_to_external_urls(@script.src, @script.line)

However, protocol-relative URLs are nowadays considered an anti-pattern, see e.g. https://github.com/konklone/cdns-to-https/blob/c455ec73817abff4946621402cf16f4de524d22d/README.md#background, https://www.designcise.com/web/tutorial/why-protocol-relative-url-are-no-longer-relevant, https://www.holisticseo.digital/technical-seo/web-security/protocol-relative-url.

Given this, HTMLProofer could, instead of converting to https:// internally, detect them as error and report something like "script link //maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js is a protocol-relative URL, use explicit https:// instead".

@gjtorikian, I have prepared both the small fix if we would still allow protocol-relative URLs

https://github.com/gjtorikian/html-proofer/compare/main...riccardoporreca:feature/750-fix-protocol-relative-script-urls-cache?expand=1

and the detection as failures:

https://github.com/gjtorikian/html-proofer/compare/main...riccardoporreca:feature/750-fail-on-protocol-relative-urls?expand=1

In the latter case running for the protocol-relative.html above yields:

For the Favicon check, the following failures were found:
* At protocol-relative.html:8:
  favicon link //github.githubassets.com/favicons/favicon.svg is a protocol-relative URL, use explict https:// instead
For the Images check, the following failures were found:
* At protocol-relative.html:5:
  image link //upload.wikimedia.org/wikipedia/en/thumb/2/22/Heckert_GNU_white.svg/256px-Heckert_GNU_white.svg.png is a protocol-relative URL, use explict https:// instead
For the Links check, the following failures were found:
* At protocol-relative.html:6:
  //github.com/octocat/Spoon-Knife/issues is a protocol-relative URL, use explict https:// instead
* At protocol-relative.html:8:
  //github.githubassets.com/favicons/favicon.svg is a protocol-relative URL, use explict https:// instead
For the OpenGraph check, the following failures were found:
* At protocol-relative.html:7:
  open graph link //github.com/favicon.ico is a protocol-relative URL, use explict https:// instead
For the Scripts check, the following failures were found:
* At protocol-relative.html:4:
  script link //maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js is a protocol-relative URL, use explict https:// instead

Happy to create a PR for whichever of the two alternatives you deem relevant.

Yes, I think reporting as an error now makes sense, rather than silently accepting.