brandonweiss / discharge

⚡️ A simple, easy way to deploy static websites to Amazon S3.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature request: Support cache headers on per file type basis

dbrookes opened this issue · comments

Ideally you need to be able to set a different cache header for different file extensions to get the best performance.

For example on a static Gatsby site:

https://www.gatsbyjs.org/docs/caching/

*.html and *.js should be cache-control: public, max-age=0, must-revalidate but other static files should be cache-control: public,max-age=31536000,immutable

Thanks for the issue!

When I designed this, the way I thought about it was that although a global cache header configuration is not perfect, it's good enough for the vast majority of use-cases. Especially considering discharge allows you to configure a CDN essentially for free.

When using a CDN, a per-file cache header isn't necessary since deploying will invalidate the cache.

So what I'm wondering is, is there a specific reason you don't want to use a CDN? Especially if the goal is to get the best performance, I think a CDN would be even better than per-file caching.

This applies more to deploying a PWA rather than a Jekyll site or something where cache isn't going to break much.

It's not about invalidating the cache on the CDN edge location, it's about controlling the users browser. Deploying an update that nobody can see because their browser is still caching html is no good 👎

I'm using Cloudfront, so the S3 object cache header determines how long the object lives at an edge location, and also determines when a browser should check for a new copy of the object at that edge location.

.html files should never be cached, otherwise users will have to refresh to get the latest CSS + JS, or wait until the cache header expires - thats not what we want, but we do want to cache the CSS and JS versions (with unique file names) that belong to the current deployment indefinitely.

I'm also using a service worker for a PWA, if you have random bits of cache expiring all over the place in the browser then you get a race condition for the current version of the app and a broken site for the user.

Also invalidations, you only get 1000 paths per month per AWS account free, then they get really expensive so its best to avoid them unless you really screwed something up. Zero cache your html, and have unique file names for your JS+CSS which you can cache. That way you get free invalidations.

The goal is performance, and if you set a global cache your users are going to have a really bad time next time you deploy an update - you need to be able to set file type cache if you want to deploy a PWA and get the top page speed scores combined with the correct UX.

Thanks for explaining in detail!

Just to be clear, when using the CDN, the cache option does not determine how long the object lives at the edge location. That is fixed to one month. It does determine how long a browser caches it for, which is why I recommend setting it to a very low number.

The problem you're saying you're having with one cache header is that if you set the cache to say, five minutes, then you'll boost performance because the browser isn't requesting from the CDN as frequently, but then when you deploy and invalidate the cache it could potentially be five minutes before a browser gets the update. And if you set the cache to say, zero seconds, then the browser will get the update as soon as the cache is invalidated, but it's less performant because the browser will be requesting from the CDN more frequently. Is that right?

I use the former (a low cache) setting with a CDN for all my sites, but yes, none of them are really web applications and it's not critical that they get the latest versions of the files immediately.

Regarding the invalidations, I don't see invalidating on deploy as being a big deal. The first 1000 in a month are free, and every next 1000 would be 5 cents. I think if someone is deploying so frequently and has so many files to invalidate that they’re paying an amount of money that is problematic then I would think they’re running some sort of business where they can probably justify paying a few bucks a month to deploy so often.

If I understand your overall use-case, you're saying you both want to use a CDN and control the cache per file or maybe just file type. You want to be able to optionally disable cache invalidation entirely, or maybe just also by file type, because you’re fingerprinting your JS/CSS/image assets so those automatically get invalidated when deploying because the fingerprint changes.

If that's the case, I think the scenario you're describing falls well outside the use-case for this tool. What you want to do is totally reasonable and I would do the exact same thing if I was building a web application, but this tool is really aimed at static sites. I know PWAs are technically static sites, but this tool is aimed at simple static sites. To allow discharge to do what you want to do would add a lot of complexity and configuration options, which sort of defeats my intent in creating a simple, easy way to deploy static sites to S3.

Sorry heres a generic use case, no need to disable invalidations:

The new wave of static site generators like Gatsby and Phenomic have PWA features built in, and their typical use case is very simple sites and blogs because of the performance boost they offer and support for offline etc. Currently they don't work with Discharge unless you set cache option to 0, but then images, fonts and other static assets are not cached.

If you could exclude some file types from being cached that would allow them to work:

"cache": {
  "default": 3600,
  "exclude": ["html", "js"]
}

Then anything in exclude gets a "no-cache, no-store, must-revalidate"

Oh, OK! Perhaps I was over-complicating it. I think conceptually a cache duration with an optional array of exclusions isn't burdensome.

There might be a way to combine the cache and cache_control options as well here—I wasn’t really happy with the complexity of having two mutually-exclusive options at the top-level of configuration.