Field middlewares executed only once per application lifecycle

Question

Field middlewares executed only once per application lifecycle

oprypkhantc opened this issue a year ago · comments

Hey.

When working with long-running servers (such as Laravel Octane, which processes multiple requests before dying, contrary to php-fpm which only processes one), a schema is only resolved once (in both graphqlite-laravel and graphqlite-bundle), which means field middlewares are also only executed once. This is awesome for the speed, but the problem is that all subsequent requests can't change the fields dynamically. That also means that even some built-in features like #[HideIfUnauthorized] stop working on subsequent requests if instance of Schema is reused.

Iterating over fields with reflection & annotations and other things on each request is obviously not an option as it defeats the purpose of a long-running server. One option is to run all field middlewares on each request, which should be a non-breaking change. Another option is to change field middlewares so they can act as both static middleware and a "dynamic" middleware which is triggered on every request.

Any thoughts?

I'm willing to implement a fix, but we need to agree on the plan first.

Jacob Thomason · Answer 1 · Thu Mar 02 2023 04:43:02 GMT+0800 (China Standard Time)

Hmm, good question here. I’m assuming the schema is built and cached on each request and these annotations are used to build it accordingly?

Assuming that’s the case, I’d think the schema would need to be rebuilt based on request context.

Oleksandr Prypkhan · Answer 2 · Thu Mar 02 2023 05:01:27 GMT+0800 (China Standard Time)

Yes, you are correct.

Rebuilding the whole Schema from scratch (through SchemaFactory::createSchema) would result in a non-negligible performance impact - even if you reuse some of the instances, we'd still have to call FieldsBuilder::getFieldsByAnnotations on every request for every query & mutation registered, which is quite a lot :(

Instead of doing so, I think limiting it to just field middlewares should be enough? That way the whole Schema stays cached in place; when a specific field is requested, only it's middlewares are triggered, greatly reducing the amount of processing behind each request. This should work for queries, mutations and fields and satisfy the needs of #[HideIfUnauthorized].

Jacob Thomason · Answer 3 · Thu Mar 02 2023 05:08:39 GMT+0800 (China Standard Time)

I'm really not sure that the FieldsBuilder, alone, is sufficient to ensure that the schema will be accurate for all annotation defined logic. Does that cover operation annotations? There is lots of security middleware used there as well.

Let's define which annotations are dynamic in nature and of concern.

Also, it's worth considering that you could use separate graphql endpoints to define your schemas. I'm not sure that's ideal, but from a performance standpoint, that'd obviously be the best.

Oleksandr Prypkhan · Answer 4 · Thu Mar 02 2023 06:25:25 GMT+0800 (China Standard Time)

It should be accurate, given only the public APIs / documented features of graphqlite are used. Going through the annotations:

#[Mutation], #[Query], #[Field], #[MagicField], #[SourceField] are all and only annotations which define fields. There's a lot of information parsed from them and aside from the middlewares, but it doesn't/can't rely on the request. There is one exception to that, however: SourceFieldInterface - one could argue that this should be run on every request and it does make sense, so we could keep track of those and run FromSourceFieldsInterface::getSourceFields() each time
#[Type], #[ExtendType], #[Input] are static as those don't rely on the request
#[Factory], #[Decorate] are static as those don't rely on the request
#[Autowire], #[InjectUser] are static as those parameters are not shown in the Schema
#[HideParameter] is static because it always hides the parameter from the Schema, not relying on the request
#[Logged], #[Right], #[FailWith], #[HideIfUnauthorized] are all handled by AuthorizationFieldMiddleware which is partly dynamic:
- it statically changes the type of the field to nullable
- it dynamically sets field resolver based on the authorization (which comes from the request and hides the field when not authorized
#[Security] is static:
- it statically changes the type of the field to nullable
- it statically sets field resolver callback (which uses the request, but only when resolving the value - request is not used to change the field definition itself)
#[UseInputType] is static, similarly to #[HideParameter] it's only parsed once

Based on all of this, the only annotations which are non static are #[Logged], #[Right] and #[HideIfUnauthorized], although first two can be refactored so that they become static as well. This only leaves #[HideIfUnauthorized].

#[HideIfUnauthorized] and #[Logged] can as well be used on inputs, which suffer the exact same problem as regular fields. InputFieldMiddlewareInterface::process has ?InputField return type, so when a user is not authorized, null is returned from AuthorizationInputFieldMiddleware::process, hiding the input field. This means that input fields must also be fixed, likely the same exact way as regular fields.

Also, it's worth considering that you could use separate graphql endpoints to define your schemas. I'm not sure that's ideal, but from a performance standpoint, that'd obviously be the best.

I mentioned performance because the sole goal of projects like Laravel Octane is to reduce boilerplate initialization, not achieve first-class performance. It doesn't seem like much until your production app starts taking 300ms+ to process a simple GET. The point here is just to get adequate performance without fanfare. Preload/preboot/preinit as much as you, but obviously some overhead is still expected.

Jacob Thomason · Answer 5 · Thu Mar 02 2023 07:53:46 GMT+0800 (China Standard Time)

So, SourceField and MagicField annotations could become an issue since they could have middleware annotations associated, that rely on the request - right?

Where are you thinking it'd be best to handle this refreshing logic?

Oleksandr Prypkhan · Answer 6 · Thu Mar 02 2023 09:03:06 GMT+0800 (China Standard Time)

Right.

Where are you thinking it'd be best to handle this refreshing logic?

And that.. I don't know yet. Here are my thoughts so far, but I'll spend more time on that tomorrow:

webonyx/graphql does validation prior to resolving the field, and to do so they need the list of fields and types for them
thankfully, it seems like webonyx/graphql can handle dynamic field definitions through GraphQL\Type\Definition\UnresolvedFieldDefinition - every time getFields() is called, FieldDefinition can be resolved on-the-fly through a callback. The only thing that can't be changed in field's name - so if we do use this mechanism, changing field's name in a middleware would be forbidden
we could build the whole Schema once the same way it's built now, but instead of piping all QueryFieldDescriptor through middlewares to get FieldDefinitions, just wrap them all in UnresolvedFieldDescription. That way we have a callback that's called each time a field needs to be resolved. A callback could look something like this:

$fieldDescriptor = new QueryFieldDescriptor();
// do regular field parsing here

$field = new UnresolvedFieldDefinition($fieldDescriptor->getName(), function () use ($fieldDescriptor) {
    return $this->fieldMiddleware->process($fieldDescriptor, new class implements FieldHandlerInterface {
                public function handle(QueryFieldDescriptor $fieldDescriptor): ?FieldDefinition
                {
                    return QueryField::fromFieldDescriptor($fieldDescriptor);
                }
            });
});

$queryList[] = $field;

In theory, something like this should work with no changes to webonyx/graphql and great performance. If this works then there are more things to consider, like should UnresolvedFieldDefinition's callback cache the result for a single request lifespan? It probably should?

Oleksandr Prypkhan · Answer 7 · Thu Mar 02 2023 09:21:20 GMT+0800 (China Standard Time)

One more thought: we could simply allow FieldHandlerInterface to return FieldDefinition|UnresolvedFieldDefinition|null and avoid wrapping the field with UnresolvedFieldDefinition ourselves. Then if developer of a middleware wants the field definition to be dynamic, all they need is to return a UnresolvedFieldDefinition and it should just work.

Oriano de Stefani · Answer 8 · Thu Mar 02 2023 23:29:59 GMT+0800 (China Standard Time)

@oprypkhantc check out this field middleware which overwrites and wraps the actual field resolver:
https://github.com/thecodingmachine/graphqlite/blob/master/src/Middlewares/SecurityFieldMiddleware.php#L74-L97

we use this to ensure code is only executed if the field is being resolved instead of introspection or other things.

Oleksandr Prypkhan · Answer 9 · Thu Mar 02 2023 23:32:37 GMT+0800 (China Standard Time)

Based on webonyx/graphql-php#1329, this will not work unfortunately. The suggested method is to rebuild the whole Schema :/

It seems like we can rebuild the Schema optimally with some refactors. I need your approval to see if it's both worth my time and your/maintainers time maintaining it afterwards:

Instead of calling FieldsBuilder everywhere directly, I plan on refactoring it into an interface. I'll provide a CachingFieldsBuilder that caches resolved fields, and it will be reused across multiple Schema instances. I'll then wrap it in a MiddlewareFieldsBuilder which actually calls middlewares each time a field is returned. Then we just need to create a new Schema instance and it would be able to use the already parsed fields.

If you don't feel like this is worth it - no worries. I think I can create a workaround for my use case.

Oleksandr Prypkhan · Answer 10 · Thu Mar 02 2023 23:34:12 GMT+0800 (China Standard Time)

@Lappihuan This only works to resolve the value. If you want to hide the field or change any of it's details (type, description etc), you can't do that.

Oriano de Stefani · Answer 11 · Thu Mar 02 2023 23:37:54 GMT+0800 (China Standard Time)

sounds like you would need to rebuild the schema then.
this is a relevant issue from some time ago: #307

Oleksandr Prypkhan · Answer 12 · Thu Mar 02 2023 23:42:31 GMT+0800 (China Standard Time)

My main concern here is the burden on this project that comes with supporting this use case. It's not widespread yet, but it's definitely rising in popularity, especially in bigger projects. The decision on whether to support this is on you guys.

Oriano de Stefani · Answer 13 · Thu Mar 02 2023 23:50:26 GMT+0800 (China Standard Time)

you could probably check if there is a cache implementation for swoole that handles caches per request and invalidates them after.
that would probably solve most of your issues in the library.

Oriano de Stefani · Answer 14 · Thu Mar 02 2023 23:58:35 GMT+0800 (China Standard Time)

otherwise you can also overwrite the middlewares provided to always do the wrapped approach.
we for example overwrite the one handling "Logged" to invert its behavior and have our own "IsPublic" attribute.

Oleksandr Prypkhan · Answer 15 · Fri Mar 03 2023 00:12:48 GMT+0800 (China Standard Time)

There is a cache implementation. If I invalidate it after each request, it effectively means a Schema is resolved from the ground up on each request, which is not what I want. The point is the exact opposite - reuse as much existing parsed, created instances as possible.

Besides, it seems like fields are cached through lazy-init properties on the webonyx side, not through graphqlite's provided cache.

Unfortunately inverting the behaviour of #[Logged] will not change anything as long as you need to change the field definition or hide the field - middlewares are still only executed once, so any subsequent requests will use the modified FieldDefinition.

Oriano de Stefani · Answer 16 · Fri Mar 03 2023 02:15:37 GMT+0800 (China Standard Time)

not sure what your exact usecase is but i'd probably try to stay away from on the fly schemas, that kind of defeates the purpose of a schema.

if you specifically want to use Logged or HideIfNotAuthorized then maybe just throw a exception or return null if its being resolved by someone lacking the authorization.

if your usecase is a internal section of the API then maybe think about splitting your schema into a public and a internal one.
you can still try to merge them with something like mesh of you really want.

Oleksandr Prypkhan · Answer 17 · Fri Mar 03 2023 04:37:44 GMT+0800 (China Standard Time)

Agreed, I don't like the idea of modifying it on the fly too.

I wanted to use #[HideIfUnauthorized] for some admin-only mutations which I really really don't want to be visible, but I think I can simply stick to regular API endpoints for a few of those.

Besides #[HideIfUnauthorized], I also want to do Schema versioning based on a header. I can work it around by pre-resolving multiple Schema instances, one per each supported version. I'll have to see how much memory it takes for one Schema but this will probably work just fine for us. I'm aware that versioning of GraphQL is not recommended :)

So maybe it's best not to overcomplicate this package for few unpopular use cases like mine, at least for now. I think it's worth adding a section in the documentation though that middlewares are only executed once and not to use #[HideIfUnauthorized] with long-running servers.

Jacob Thomason · Answer 18 · Fri Mar 03 2023 05:42:39 GMT+0800 (China Standard Time)

@oprypkhantc async servers in PHP are certainly a growing trend, and I suspect they'll become fairly standard in coming years. I don't think your use case is unpopular, necessarily. If anything, you're just more of an early adopter. I also believe it's in the best interest of this repository to be supportive.

@Lappihuan makes a great point that dynamic schemas kinda defeat the purpose, one of the reasons we don't use #[HideIfUnauthorized]. Although, there is certainly a very valid use case for supporting multiple schemas. I think that's something GraphQLite could do a much better job with. They wouldn't need to be request dependent, instead pre-defined based on your business needs.

You can achieve some of this with different namespacing for operations and types. However, it's often ideal to not separate these out in such a way. I know for us, we'd prefer to have internal mutations sitting next to public ones in the same namespace, because our operations are grouped together based on entities/types mostly. And, some types we'd prefer to not have exposed over a public API, and only over an internal one. We're not moving around our models to accommodate this concern. Further, fields on types certainly need to differ as well. We can create different types for this case, but that's just creating more tech debt really.

Again, we're not supporting multiple schemas yet, as our internal systems hit older APIs. But, this is something we've looked into for the future. Creating custom endpoints to handle your internal API needs is an obvious solution, but is really just spreading out much of your logic and certainly won't keep things DRYed up.

Jacob Thomason · Answer 19 · Fri Mar 03 2023 06:35:33 GMT+0800 (China Standard Time)

@oprypkhantc it sounds like that would resolve your concerns - pre-defined schemas that get built accordingly? What if we introduced a new annotation argument called schema that allowed for you to define if a type or field (operation) is to be included within the present schema being built? That seems like it'd be a pretty simple solution that'd address this concern and provide better schema management overall.

In the SchemaFactory you can define which schema is being targeted.

Oleksandr Prypkhan · Answer 20 · Sat Mar 04 2023 04:01:57 GMT+0800 (China Standard Time)

@oojacoboo This will work, but I don't think this is a good solution. I still have to build the multiple schemas myself (trying to share as much dependencies as possible), manage own routes, controller etc and that seems to be the hardest part.

Filtering field is pretty trivial - I've created a #[ForVersions] attribute and use field middlewares to filter based on that. Each Schema gets it's own instance of that middleware with different parameters. Middleware system is pretty flexible and is easy to use :)

I believe the best graphqlite can do here is simplify creating multiple schemas without wasting resources on dependencies that can be shared, the rest is pretty simple.

Oleksandr Prypkhan · Answer 21 · Sat Mar 04 2023 04:14:06 GMT+0800 (China Standard Time)

Maybe given that a DI container is a required dependency, instead of creating all instances in the SchemaFactory, pull them from the container instead? The only problem is that it will have to be duplicated in graphqlite-bundle and graphqlite-laravel, but at least developers would be able to overwrite any dependency they like and it'd be easy to control whether an instance is "shared" across schemas or not - by marking it so in the container.

Jacob Thomason · Answer 22 · Sat Mar 04 2023 15:58:13 GMT+0800 (China Standard Time)

@oprypkhantc I think a fork in the schemas is much preferred, or would be in our case. Being able to use and configure the SchemaFactory to be different per schema is a huge win. There are likely many differences in how middleware should handle these requests. And, I'd imagine over time, the needs would only continue to diverge. Trying to keep these things using the same SchemaFactory and middleware, etc. seems like the wrong battle in the long run.

At the end of the day, most of this will be cached, so I don't see the issue with building up 2 different schemas entirely.

Oleksandr Prypkhan · Answer 23 · Mon Mar 06 2023 22:30:07 GMT+0800 (China Standard Time)

Okay. Let's not take any action on this then to avoid doing things entirely wrong. I'm fine with building multiple schemas semi-manually as I'm doing now.

I'll PR a documentation change to specify that middlewares are intended to be request independent as to avoid further issues like this, then I'll close this issue. Is this ok?

Jacob Thomason · Answer 24 · Tue Mar 07 2023 02:54:35 GMT+0800 (China Standard Time)

Sounds great @oprypkhantc - thanks.