humanmade / tachyon

Faster than light image resizing service that runs on AWS. Super simple to set up, highly available and very performant.

Home Page:https://engineering.hmn.md/projects/tachyon/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Idea to route requests to cache file

joehoyle opened this issue · comments

Currently there is no disk cache for Tachyon files, only the CloudFront edge cache. By introducing an S3 cache and Edge function, we might be able to solve a couple problems in one:

  • Tachyon should cache it's output to S3 when it generates a file. We can use S3 lifetime object length for simple expiry.
  • We have a Lambda@Edge function which checks for the S3 file, if it exists, it could adjust the origin request, to instead get the file routed directly to S3.

I'm not 100% sure you can do this with Edge functions (adjust the origin), but I think you can. We already do it with an X-WebP header which changes the CloudFront behaviour.

By routing cache requests directly to S3, we can save on many Lambda invocations, and we can also get around the 5MB limit we currently have for image responses.

The uncached Lambda process will still be subject to a 5MB response limit, which I don't think we'd be able to get around though, but it would mean only a first request would fail, subsequent cached requests to S3 would be fine.

Thoughts @nathanielks @faishal perhaps?

A potential option without the need to dynamically change origins could be to use an Origin Response Lambda? We set up /tachyon to point to S3, then use the Origin Response event to detect if the file wasn't found and generate the image in that case?

Also, what a clever solution to getting around the size limit! That hadn't occurred to me till just now. Regarding how to get around it... in my scenario when it generates the image when it doesn't exist... would it be possible to generate it, upload it to S3, then issue a redirect so that the image would be served directly from S3 instead of the Lambda function?

I'm not 100% sure you can do this with Edge functions (adjust the origin), but I think you can. We already do it with an X-WebP header which changes the CloudFront behaviour

Yes, we can do that.

The uncached Lambda process will still be subject to a 5MB response limit, which I don't think we'd be able to get around though, but it would mean only a first request would fail, subsequent cached requests to S3 would be fine.

For large image sizes, we can return signed s3 URL and add 302 redirect to avoid fail the request. the next request will be served directly from s3.

One advantage to the Origin-Response solution would be it'd only be triggered when the image doesn't exist in the edge edge cache, whereas if we're attempting to change the origin, it'd have to be Viewer-Request, which would have considerably more Lambda executions.

We wouldn't be able to do the image processing in Lambde@Edge (origin response etc), because Tachyon won't run in Lambda@Edge due to limitations on execution time etc. We need the Tachyon resize task to run in Lambda proper.

The 302 to S3 is interesting, we'd need to make sure CloudFront wouldn't cache that response. Or, could we do a 302 to the same url, so once it's finished being processed, it just would 302 back to the same logic, which this time finds it on S3 :mindblown:

I checked the function limits and it says Origin requests have a 30 second timeout. Do you think that'd be enough? https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/cloudfront-limits.html#limits-lambda-at-edge

Ah didn't realize origin had better limits, in that case could be possible!

So, we have a couple options:

  1. S3 as the origin.
  2. Two origins: Lambda & S3

In case 1 (S3 as the origin) we'd have Lambda@Edge running in a Origin-Response looking for 404 responses, and running tachyon "inline" in that lambda function, and returning the response (or a 302). I think we'll need an Origin-Request Lambda@Edge function here too to map a Tachyon url with query params to a filename on S3.

In case two, we'd have a Lambda@Edge Viewer-Request check if a cache file exists, and if so, change the request path to the S3 origin match. If it doesn't exist, it will fall through to the Lambda function which will save the file to s3 and return it in the response (or 302).

In both cases we also need a Lambda@Edge Viewer-Request function to check for WebP support, so CloudFront caches accordingly.

Am I getting that all right?

In case two, we'd have a Lambda@Edge Viewer-Request check if a cache file exists, and if so, change the request path to the S3 origin match. If it doesn't exist, it will fall through to the Lambda function which will save the file to s3 and return it in the response (or 302).

We'd also need to check if we can return a response in a Origin-Request lambda@edge invocation actually, not sure if that is possible.

Do you think that'd be enough?

Damn, looked at CloudWatch for Tachyon and there are quite a number of > 30 second executions. Most of them fall under that, but we'd have a few errors. Alternatively the Origin Response lambda could simply trigger the Tachyon Lambda proper so that it doesn't have to wait for the request to complete, but the file would be unavailable for a time.

In both cases we also need a Lambda@Edge Viewer-Request function to check for WebP support

This is a great point, and I hadn't thought of it. Honestly, I'd try to incorporate this functionality into the existing Viewer Request function so we don't get any more invocations levied against us!

Yeah, though viewer-request is certainly too small limits to actually do a resize in

Damn, looked at CloudWatch for Tachyon and there are quite a number of > 30 second executions. Most of them fall under that, but we'd have a few errors. Alternatively the Origin Response lambda could simply trigger the Tachyon Lambda proper so that it doesn't have to wait for the request to complete, but the file would be unavailable for a time.

I think we might be able to increase the memory / CPU allocation. The lambda function in our case for the most part executes linearly (not true for s3 transfer time part of it!) to CPU available, so if we increase CPU, we should get billed for less seconds

Yeah, though viewer-request is certainly too small limits to actually do a resize in

For sure. It does seem to fit the use case of what we're looking for at least!

We'd also need to check if we can return a response in a Origin-Request lambda@edge invocation actually, not sure if that is possible.

The docs I referenced earlier mention the allowed size for a response body, so I believe it can!

I think we might be able to increase the memory / CPU allocation.

That we can! Origin functions can be as large as regular lambda functions, so we could definitely size up if we wished.

Stumbled on this while researching something else, but here's an AWS article on routing requests to specific origins using Lambda@Edge: https://aws.amazon.com/blogs/networking-and-content-delivery/dynamically-route-viewer-requests-to-any-origin-using-lambdaedge/

Are you guys familiar with tachyon-edge? It works more or less like you have described.
I have tried it but will probably end up implementing hm-tachyon.
IMO, in most use cases, the CloudFront cache should work well.
The main exception, of course, is if you are serving multiple regions, in a significant way.

Also, the time to check if the file exists on the S3 cache is probably not negligible.
I tend to prefer the simpler architecture, however the ttl on the cache sounds interesting.

Another possibility I am willing to explore, without LambdaEdge, would be to use the 500MB /tmp as a lambda container cache. I think this could probably work well, especially if you setup API gateway with a regional lambda.

I am definitely not an expert on AWS. So, what are your thoughts. Does it make sense?

Another possibility I am willing to explore, without LambdaEdge, would be to use the 500MB /tmp as a lambda container cache. I think this could probably work well, especially if you setup API gateway with a regional lambda.

Yeah I think that could work too. Though I don't know how many containers are typically running at once. Also API Gateway supports caching too, the main issue there is it looked very expensive.

Since we had need for an extra cache (so the images are not resized for every edge location),
we implemented, what was discussed here.
#126

I'm going to close this as not planned. People can use Origin Shield to prevent a cache per edge location in CloudFront too.