[Discuss] Caching time of assets on production

Question

[Discuss] Caching time of assets on production

ruflin opened this issue 4 years ago · comments

All the assets served by the package-registry are going through the CDN and can be cached by the CDN. This issue is to discuss the best caching times in our production environment. The current values are:

cache_time.search: 1m
cache_time.categories: 10m
cache_time.catch_all: 10m

Cache values explained

To better understand what each means, here a few details on which ones needs updating when. The /search endpoint needs an update every time a new version of a package is released. This is basically the maximum delay until a released package shows up in the overview.

If a new package is released, also the category counters must be updated. This is less critical then the /search endpoint as this is only serving counters. During this cache time, the counters might not be 100% accurate but has not direct effect on the user.

All the other assets are static. A package that is released should never change (was not 100% true in the past but is hopefully in the future). The same is true for all the assets inside a package. New package assets will be available immediately as these are always all new paths. The edge case might happen that a package is removed from the registry so it will mean the cache has it for a bit longer.

There are two additional endpoints. The / index endpoint with info about the registry which is currently falling under the catchAll cache but I think it should have its own cache: elastic/package-registry#631 It is important that the new version number is available quickly. The other endpoint is /health which is not cached at all.

Recommended cache values

In a perfect world, the catch_all value could be set to a year or more as it should never change. But as we are still changing registry and packages, I would not set it too high for now. The advantage of higher catch_all values is also less traffic to the CDN as the assets will be cached locally by the browser.

My current recommendation for production would be:

cache_time.index: 10s  # Assuming we have https://github.com/elastic/package-registry/pull/631
cache_time.search: 1m
cache_time.categories: 10m
cache_time.catch_all: 1h

For snapshot and staging, the values should stay low.

Nicolas Ruflin · Answer 1 · Wed Sep 02 2020 15:50:05 GMT+0800 (China Standard Time)

@ycombinator @kuisathaverat @mtojek @ph Would be great to get your feedback on this one.

Pier-Hugues Pellerin · Answer 2 · Wed Sep 02 2020 23:08:56 GMT+0800 (China Standard Time)

@ruflin Your values make sense, but I am curious about the strategy. We control the deployment of the packages could we also invalidate the cache on push?

Nicolas Ruflin · Answer 3 · Thu Sep 03 2020 14:56:49 GMT+0800 (China Standard Time)

@ph We could, but it makes the deployment much more complex as now suddenly we need access to fastly during deployment time. Not purging it has also the advantage that if something goes wrong, there is still the cache available to serve the content and as far as I remember, there is also a price-tag attached to purging. Last, I don't see too much benefit in purging if we keep the caching values low for the dynamic endpoints.

Pier-Hugues Pellerin · Answer 4 · Thu Sep 03 2020 20:16:05 GMT+0800 (China Standard Time)

@ruflin Good point on "permission", but its something to consider when we have CD on, we could combine that with the health check and just purge the cache.

Pier-Hugues Pellerin · Answer 5 · Thu Sep 03 2020 20:16:28 GMT+0800 (China Standard Time)

+1 on the proposed configuration so we can move on.

Nicolas Ruflin · Answer 6 · Tue Dec 22 2020 23:08:18 GMT+0800 (China Standard Time)

Closing as this was changed in #380