Implement Persisted Queries mechanism to easily allow server-side Caching

Question

Implement Persisted Queries mechanism to easily allow server-side Caching

WizMik opened this issue a year ago · comments

One of the main problem of GraphQL is the lack of compatibility with server-side caching mechanisms.

Although it's an option to put a Middleware before GraphQLite and handle the caching ourselves, this solution is not necessarly convenient with all frameworks (like Symfony) and make it complicate to setup.

One solution could be using HTTP Caching. It could be a good solution because it does exactly what we need here: caching responses (Query) that doesn't change much over time. But meanwhile HTTP Caching is quiet straitforward with RESTful APIs, because of the unique nature of a RESTful URL, it's more challenging with GraphQL as it's commonly served over POST through a unique URL, making it incompatible with the HTTP Cache standard.

A solution could be serving GraphQL queries overs GET, but it often ends up creating incredibly long URLs and causes other problems.

This issue is global to all GQL users and has been discussed in many ways (by GraphQL Team and Apollo team), and one solution could be using Persited Queries.

The idea would be to hash the whole query and variables as a unique string to be passed to the server to shorten the payload. Doing so would reduce the length of it and make it compatible with HTTP GET method or otherwise to make it unique over HTTP Cache standard.

A good article to read about it is the one wrote by Leonardo Losoviz. It describes the problem very well and offers glimpses of solutions to implement.

Apollo client also offers another way to create Persisted Queries, it could be a good start point to implement such a feature on GraphQLite: . One good thing about the Apollo mechanism is that it could allow an extended implementation of common Caching standards including PSRs, regardless of the HTTP Cache standard.

That being said, I might have missed other best practises on this matter or maybe the fact that it's already possible to work with Server-side Cache and GraphQL. Your opinion at TCM about this topic would be very interesting to read, aside from the fact that it would be a great feature for the library.

Oriano de Stefani · Answer 1 · Fri Mar 03 2023 18:19:44 GMT+0800 (China Standard Time)

@WizMik there is a relevant answer by the author here: #125 (comment)

WizMik · Answer 2 · Sat Mar 04 2023 01:37:09 GMT+0800 (China Standard Time)

Indeed @moufmouf pointed out the Middleware caching solution, which in my opinion is clearly the best option. While it requires to interfer before and after the GraphQLiteMiddleware, wouldn't it be nice to have it built in since GraphQLite already wraps Webonyx ?

Jacob Thomason · Answer 3 · Sat Mar 04 2023 16:11:01 GMT+0800 (China Standard Time)

@WizMik what's being cached exactly? Are you caching the response on these requests and trying to match for a GET to return the same response? If so, what about request headers like Authorization that'd define a specific user. A GET request doesn't always return the same response.

Did you take a look at the potential solution I offered regarding stateless routes, similar to what Symfony does?
#125 (comment)

WizMik · Answer 4 · Sun Mar 05 2023 03:56:07 GMT+0800 (China Standard Time)

For now I implemented a temporary solution that extends the GraphQLiteController. It's inspired by the Persisted Query spec described by Apollo but my goal is to match this exact behaviour to match the fonctionnality of their client.

What happens is that the server checks if a hash is given with the request. This hash is based on the query and its parameters which makes it unique as well as the expected response. If provided, it is used as a key to store or return existing Response object from the cache, avoiding using the HTTP Cache (which afterall is not a big deal).

For now I use it for public pages to accelerate search engine crawling, but as you say it raises the question of Authorization for other queries. I did it on the GraphQLiteController level and not as a PHP Attribut to avoid serialization of complpex objects and cache directly the whole Response object.

Jacob Thomason · Answer 5 · Sun Mar 05 2023 03:58:39 GMT+0800 (China Standard Time)

So, it sounds like what I'm suggesting is exactly what you want then.

WizMik · Answer 6 · Sun Mar 05 2023 04:02:22 GMT+0800 (China Standard Time)

As you said also, stateless queries might be challenging for logged in users ?

Jacob Thomason · Answer 7 · Sun Mar 05 2023 04:03:13 GMT+0800 (China Standard Time)

@WizMik it's an opt-in argument on the attribute. It's a total non-issue, unless the developer is being careless. And, in that case, it's on them.

Oleksandr Prypkhan · Answer 8 · Fri Mar 10 2023 18:06:30 GMT+0800 (China Standard Time)

@oojacoboo It's not what you thought. When speaking about Apollo's persisted queries, their main point is to shorten the payload by passing query's SHA256 hash instead of the query itself. Apollo docs describe it very well.

Indeed persisted queries will help with HTTP-level caching by allowing simple GET requests (instead of all POSTs), but persisted queries themselves are meant to solve a different problem - huge payloads when selecting a lot of fields. It'd be nice to have support for persisted queries natively.

Jacob Thomason · Answer 9 · Sat Mar 11 2023 12:42:04 GMT+0800 (China Standard Time)

@oprypkhantc yes, I realize they're not solving the exact same issue. How does a hashed version of the payload work for caching where queries will differ per request based on the User? Will the Authorization and other headers be taken into consideration?

Also, I'm assuming GraphQLite really wouldn't need to be concerned with caching based on persisted queries. It'd just simply decode the hash and process the payload normally.

Oleksandr Prypkhan · Answer 10 · Fri Jul 28 2023 22:42:23 GMT+0800 (China Standard Time)

Back onto this, I've implemented this mechanism in our package. I'd love to backport this to graphqlite if possible.

@oojacoboo It works as expected with different queries. The thing is that only the query string is cached, not variables/schema/AST, meaning it's essentially just "find the query string by that hash; if found, use that query as if it was passed to the server directly". If two different users make a request with the same hash, well, that only means they executed the same query, but nothing else is cached or shared.

The thing is it's not an encrypted payload, it's just the hash, so you can't decode it. An encoded string would probably be the same size, if not larger than the query itself, so it would make little sense to do so.

The implementation of caching is actually quite trivial (1 new file and additions to configuration) thanks to webonyx/graphql-php supporting this mechanism out of the box, so it shouldn't be a large maintenance burden.

Jacob Thomason · Answer 11 · Mon Jul 31 2023 13:26:57 GMT+0800 (China Standard Time)

@oprypkhantc I'm assuming the primary goal here is HTTP caching then? And GraphQLite is simply going to be de-hashing that calling the query? I'm guessing whatever http caching service is being used will take Authorization headers into account?

So there is less interest here for any server-side solution and mostly leaning on GET over the http layer? Assuming the http caching layer takes the Authorization header into account, this is probably the simplest and most performant solution.

I assume GraphQLite would only attempt to process the hash if it's a GET request.

I'd certainly welcome a PR for review (doc updates needed).

Oleksandr Prypkhan · Answer 12 · Mon Jul 31 2023 17:18:31 GMT+0800 (China Standard Time)

Well, kind of. The goal is to reduce payload from client to server for faster networking, but HTTP caching is not used here. The algorithm is as follows:

user configures a cache backend for automatic persisted queries, say Redis (or, when possible, something like in-memory cache from Swoole)
client sends a GET /graphql?hash=asjkdajksdjk request
webonyx/graphql-php sees a hash request parameter and calls GraphQLite's handler
GraphQLite's persisted query handler checks if that hash is in the cache:
1. if it is, it pulls the actual query from cache and returns it. webonyx/graphql-php then uses it as if the client sent a GET /graphql?query=query { field } from the start
2. if it isn't, it throws an error with a specific code. That code is sent back to the client, letting it know the query wasn't found. The client then sends both the query and the hash for the server to remember in cache: GET /graphql?query=query { field }&hash=asjkdajksdjk. The request is processed as normal, but a query is also written to the cache, so that next time a server receives GET /graphql?hash=asjkdajksdjk request, it will simply pull the query from cache

Whether or not to call the GraphQLite's persisted query handler is decided on webonyx side, so we don't care if it's a GET or a POST, although they may have some checks on their side - I'm not sure.

Jacob Thomason · Answer 13 · Mon Jul 31 2023 17:57:01 GMT+0800 (China Standard Time)

As far as the caching is concerned, what about supporting authentication, beyond a random hash? I guess with Redis, you can set a timeout, which is good. And maybe that's enough in most cases for reasonable security. But an extra layer of authentication, such that hashes are grouped by an authenticated user in your cache store, would add some extra comfort. Then only the hashes provided by a given authenticated user could be used.

Where is the cache implementation handled? Would that be left up to the dev in the middleware?

Is there any standard around these hashes, or is left to the client to use pretty much anything? Also, is this the reference standard Apollo has implemented?

Oleksandr Prypkhan · Answer 14 · Mon Jul 31 2023 18:30:48 GMT+0800 (China Standard Time)

As far as the caching is concerned, what about supporting authentication, beyond a random hash?

Let me make a PR and you'll see why authentication is irrelevant here :) The short explanation is that regardless of authentication or other payload, if two queries are the same string, they'll always be hashed to the same hash. So if a cache is hit twice by different users, they both effectively executed the same query string anyways (even if both got different responses based on authentication), so there's really no reason to worry about authentication.

Is there any standard around these hashes, or is left to the client to use pretty much anything?

As far as I know, Apollo is the only widespread standard out there. Apollo defines specific parameter names (handled by webonyx already) and error codes to use, as well as the hashing method - SHA256.

Also, is this the reference standard Apollo has implemented?

Yes: https://www.apollographql.com/docs/apollo-server/performance/apq/