cloud8421 / dragonfly-server

Elixir app to serve Dragonfly images

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Idea: Cache backend fetches

felixbuenemann opened this issue · comments

There are many scenarios, in which the same images are requested from the backend multiple times in a short period:

  • Same file fetched as download and for thumbnail (eg. PDF thumb)
  • Master/detail/zoom: In ecommerce applications it's common to have the same image as a thumb on the list page, than another thumb and small image on the detail page and maybe even a large/original size image for a zoom view
  • Generating smaller images for different screen resolutions (eg. generating smaller images for phones than desktops)
  • Generating different formats for different user agents (eg. jpeg, webp, jpeg2000)

If a new original is added, it is likely to stay "hot" for some time, until all of the common variants are generated.

My idea is to transform the fetch operations into the same job syntax and cache them the same way as generated versions. There are some caveats when generating cache keys, eg. signed urls would need their cache keys to be generated without signing or they'd be useless. Ideally there would also be the same cache key used for a simple fetch/download job from a client and the same operation cached as part of eg. a thumbnail job.

It would probably also be a good idea to be able to configure a different cache server for backend fetches to avoid cache trashing and be able to allocate cache size separately for originals.

What do you think about the idea?

I'm a bit undecided, because in many cases this can just as well be built using a simple reverse proxy in nginx and pointing the http_engine_host at that proxy. Then again the same can be said for the caching of job results. In my current deployments of the rack dragonfly I've just put an nginx proxy in front of it, so the rack app is idle unless new images are added.

This was present in the initial architecture, but it was taken out for 2 reasons:

  • being deployed on Heroku, the http fetch cache had to be in an external memcache instance.
  • the time needed to fetch from memcache was similar to the time needed to fetch from http, so there was no real performance gain.

What you suggest could be useful in case of a dedicated deployment, where all caches are kept in memory on the same machine. Happy to dig out the old commits if needed.

In terms of implementation, this is what we had: 20a1634

Thanks, I should probably build a small test script that requests multiple versions of the same original and compare performance benefits. This should be most beneficial when the latency to the origin (engine) is high.

Sounds good. Do you still consider this open?

Let's close for now. I can re-open if I find backend caching to be beneficial.