Idea: Cache backend fetches

Question

Idea: Cache backend fetches

felixbuenemann opened this issue 9 years ago · comments

There are many scenarios, in which the same images are requested from the backend multiple times in a short period:

Same file fetched as download and for thumbnail (eg. PDF thumb)
Master/detail/zoom: In ecommerce applications it's common to have the same image as a thumb on the list page, than another thumb and small image on the detail page and maybe even a large/original size image for a zoom view
Generating smaller images for different screen resolutions (eg. generating smaller images for phones than desktops)
Generating different formats for different user agents (eg. jpeg, webp, jpeg2000)

If a new original is added, it is likely to stay "hot" for some time, until all of the common variants are generated.

My idea is to transform the fetch operations into the same job syntax and cache them the same way as generated versions. There are some caveats when generating cache keys, eg. signed urls would need their cache keys to be generated without signing or they'd be useless. Ideally there would also be the same cache key used for a simple fetch/download job from a client and the same operation cached as part of eg. a thumbnail job.

It would probably also be a good idea to be able to configure a different cache server for backend fetches to avoid cache trashing and be able to allocate cache size separately for originals.

What do you think about the idea?

I'm a bit undecided, because in many cases this can just as well be built using a simple reverse proxy in nginx and pointing the http_engine_host at that proxy. Then again the same can be said for the caching of job results. In my current deployments of the rack dragonfly I've just put an nginx proxy in front of it, so the rack app is idle unless new images are added.

Claudio Ortolina · Answer 1 · Mon Aug 03 2015 17:05:57 GMT+0800 (China Standard Time)

This was present in the initial architecture, but it was taken out for 2 reasons:

being deployed on Heroku, the http fetch cache had to be in an external memcache instance.
the time needed to fetch from memcache was similar to the time needed to fetch from http, so there was no real performance gain.

What you suggest could be useful in case of a dedicated deployment, where all caches are kept in memory on the same machine. Happy to dig out the old commits if needed.

Claudio Ortolina · Answer 2 · Mon Aug 03 2015 17:07:21 GMT+0800 (China Standard Time)

In terms of implementation, this is what we had: 20a1634

Felix Bünemann · Answer 3 · Mon Aug 03 2015 20:19:13 GMT+0800 (China Standard Time)

Thanks, I should probably build a small test script that requests multiple versions of the same original and compare performance benefits. This should be most beneficial when the latency to the origin (engine) is high.

Claudio Ortolina · Answer 4 · Fri Aug 07 2015 15:01:49 GMT+0800 (China Standard Time)

Sounds good. Do you still consider this open?

Felix Bünemann · Answer 5 · Fri Aug 07 2015 17:57:38 GMT+0800 (China Standard Time)

Let's close for now. I can re-open if I find backend caching to be beneficial.