Studiosity / grover

A Ruby gem to transform HTML into PDFs, PNGs or JPEGs using Google Puppeteer/Chromium

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The HTMLPreprocessor does not replace the relative URLs in the assets, e.g. CSS

longnd opened this issue · comments

Thank you for the work and so much effort put in the gem :)

Please correct me if I'm wrong, but based on the code of the HTMLPreprocessor, it does not replace the relative URLs in the asset files, e.g. CSS files.

module HTMLPreprocessor
# Change relative paths to absolute, and relative protocols to absolute protocols
def self.process(html, root_url, protocol)
html = translate_relative_paths(html, root_url) if root_url
html = translate_relative_protocols(html, protocol) if protocol
html
end

Example: given this HTML code

<html>
  <head>
    <link rel="stylesheet" href="/asset/application.css" />
  </head>
  <body>...</body>
</html>

and CSS code

# application.css
@font-face {
  font-family: "Mitr";
  src: local("Mitr"), url(/assets/Mitr-300-Light-d38fd8a6500fcf1cdd774a2a4ed2d8ff75da77d7944fb60ea1e2940c8a54aa96.ttf) format("truetype");
}

running the HTML code through the HTMLPreprocessor, e.g.

absolute_html = Grover::HTMLPreprocessor.process relative_html, 'https://my.server/', 'https'

the above HTML code will be turned into

<html>
  <head>
    <link rel="stylesheet" href="https://my.server/asset/application.css" />
  </head>
  <body>...</body>
</html>

but the CSS code will not be updated and the asset URLs (the font URL in the above example) remain the same. Should it be worth mentioning that in the README file?

hi @longnd

Hmm, yes interesting problem. The issue I see here is that the HTML preprocessor, as the name suggests, is pre-processing the HTML (ie processing the static content that you've passed to it BEFORE it is rendered in the browser). What you would need to fix the CSS relative path issue as you described would be something to update the content of the CSS while/once the page has actually been loaded in the browser!

Fixing that runtime asset would be possible as the grover JS processor is intercepting all page requests so that it can inject the HTML content you've provided into the page. So grover could potentially intercept the CSS requests and apply some runtime processing to "fix" those relative paths too. See https://github.com/Studiosity/grover/blob/main/lib/grover/js/processor.cjs#L162-L170 specifically L165 where we currently just let the request continue. We'd need to modify that to identify if relative path fixing was necessary and if the request was for CSS, then intercept the request. I'm not 100% sure it would work.. but it "should" do!

But to answer your question, yes, some sort of errata in the README describing the current state of this would be a good idea. Are you able to put something together?