DataLoader dispatches together keys from different requests
edacostacambioupgrade opened this issue · comments
At when using what I think is a standard setup (using graphql-spring-boot
with DataLoaderDispatcherInstrumentation
and DataLoaderRegistry
singleton beans) when two (http) requests from different callers request the same data type by the same key (i.e. use the same DataLoader) all keys are enqueued and dispatched together: BatchLoader.load(List<K> keys)
is called with keys merged from both request.
I have not used the facebook node implementation but from what I understand, their DataLoaders are created per-request, so this merging doesn't happen.
While this behavior may be desirable in some cases it comes with some drawbacks:
- issues with keys on one request affect the other request and this not very deterministic (unless you backing service is smart enough to return per-key errors)
- if one request loads 1 key and another one loads 1K keys, both will have the latency of loading 1001 requests, and again, this is not very deterministic.
- if you are propagating authentication and your backing service only takes a global authentication principal (ie: an authorization header) you cannot send the requests together anyway, you need to split by requestor (or execution id) (you could live with this if you backing service took in a per-key principal but that would be pretty ugly i think)
i wonder:
- is this behavior intentional?
- is this a problem with the way I have it set up?
- would you be open for a PR that enables devs choose to merge or not to merge keys?
if this is an issue with my setup then you can skip the rest, otherwise:
these are the options i'm considering at the moment:
- wrapping the
BatchLoader.load(...)
method with one that splits by execution id, this solves some interference issues but it still makes all concurrent requests wait until everyone else's data is available. - subclassing
DataLoader
to implement something likesliceIntoBatchesOfBatches
but doing it by execution id. this could work but it has two issues:- most of the things i would need to change in the
DataLoader
class are private so it would involve either copying code or gaining access by reflection :S - this is fine for the
BatchLoader.dispatch()
method because it doesn't wait for the overall result, but thedispatchAndJoin()
would still wait for every request to finish. i don't mind because I don't use it and the instrumentation only ends up callingdispatch()
- while this approach won't make callers wait, it would still sometimes dispatch "early" some keys of other requests maybe even before they are completely enqueued, resulting occasionally in more requests in a non-deterministic way)
- most of the things i would need to change in the
- another option i considered is to make the
DataLoader
a per-request object to makeDataLoader
s entirely isolated, this isn't easy though, I would need to provide means forDataFetcher
to access the rightDataLoader
for given request, with some effort, I could keep a map by execution id but is not easy to manage it's life-cycle (I fear i would end up with leaked instances).
this is what i would like:
option 1
DataLoader.dispatch()
andDataLoader.dispatchAndJoin()
andDataLoaderRegistry.dispatchAll()
should take an executionId as a parameter. Depending on a data loader option either all requests are dispatched or only requests for that execution id are dispatched. TheDataLoader.load(K key)
method would also need to take in an execution id (or aDataFetchingEnvironment
)DataLoaderDispatcherInstrumentation.dispatch()
passes the execution id toDataLoaderRegistry.dispatchAll()
DataLoaderDispatcherInstrumentation.beginExecution(instrumentationParameters).onEnd(...)
calls a new methodDataLoaderRegistry.discardAll(ExecutionId)
(that calls a newDataLoader.discard(ExecutionId)
method) to make sure appropriate clean is on in case of errors/abortion.- would that enough cleanup or is there any case in which keys may have been queued but beginExecution.onEnd is not called?
option 2
similarly but without changing the DataLoader make DataLoaderRegistry be aware of executions and keep a map of executionid -> DataLoaders (it would need to be built with DataLoader suppliers instead of DataLoaders directly (with this apporach only the DataLoaderRegistry.dispatchAll()` method needs to be modified to take in the execution id. in this case the DataLoaderRegistry would need to expose a means to retrieve the DataLoader for a specific execution for DataFetchers to use.
option 3
same thing but managed by the instrumentation, changing the DataLoaderDispatcherInstrumentation
to take DataLoaderRegistry
supplier instead of a DataLoaderRegistry
this supplier or the instrumentation would to expose a method to return the DataLoaderRegistry associated with an execution id so that DataFetchers can get the right one.
Wait... Why are you making DataLoaderRegistry
a singleton if you want it per request? A singleton DataLoaderRegistry
is only applicable to a very specific use-case and is not common at all.
What is normally done is having a DataLoaderRegistry
created per request and stored into the global context for the execution, e.g.
DataLoaderRegistry dataLoaderRegistry = ...; // create per request
//Transform the pre-configured GraphQL instance or create a new one
GraphQL runtime = graphQL.transform(builder -> builder.instrumentation(
new DataLoaderDispatcherInstrumentation(dataLoaderRegistry)));
//Make dataLoaderRegistry accessible to fetcher functions
ExecutionInput.newExecutionInput()
.query(...)
.context(dataLoaderRegistry)
.build ();
This is very simple and requires no low-level concurrency control nor keeping track of executions. So I think there's nothing wrong with the current implementation.
oh i see... thanks! i didn't realize that creating a GraphQL
object was so lightweight. so it is indeed a problem with my setup.
By looking at graphql-spring-boot
s GraphQLWebAutoConfiguration.graphQLServlet(...)
, it looks like i have to declare my instrumentation and data loader registry beans with @RequestScope
and then add a GraphQLContextBuilder
that creates a context with the request-scoped registry. is that right?
closing this, it's a non-issue, thanks a lot!
although i found it a bit unintuitive, so i'm leaving this here for other noobs like me, i had to do this:
@Bean
@RequestScope
public DataLoaderRegistry dataLoaderRegistry() {
...
}
@Bean
@RequestScope
public Instrumentation instrumentation(DataLoaderRegistry dataLoaderRegistry) {
return new DataLoaderDispatcherInstrumentation(dataLoaderRegistry);
}
but because while ExecutionInput
it's ok with any Object
context, the GraphQLServlet.createContext(..)
wants a GraphQLContext
instance (see SimpleGraphQLServlet
s GraphQLContextBuilder
field too) so I had to create a GraphQLContextBuilder
implementation that returns a subclass of GraphQLContext
instead of setting the registry directly. not a big deal, but i wonder if everyone is doing this or if people is using singletons without realizing the consequences? (or perhaps the is a more straightforward way that i'm not seeing)
to set the registry in the context i had to do this:
@Bean
public GraphQLContextBuilder graphQLContextBuilder(DataLoaderRegistry dataLoaderRegistry) {
// note that dataLoaderRegistry is a request scoped proxy.
return new GraphQLRequestContextBuilder(dataLoaderRegistry);
}
and here are my context builder and context subclass:
public class GraphQLRequestContextBuilder implements GraphQLContextBuilder {
private final DataLoaderRegistry dataLoaderRegistry;
// ... constructor ...
@Override
public GraphQLContext build(Optional<HttpServletRequest> request, Optional<HttpServletResponse> response) {
return new GraphQLRequestContext(request, response, dataLoaderRegistry);
}
}
public class GraphQLRequestContext extends GraphQLContext {
private final DataLoaderRegistry dataLoaderRegistry;
public GraphQLRequestContext(Optional<HttpServletRequest> request, Optional<HttpServletResponse> response, DataLoaderRegistry dataLoaderRegistry) {
super(request, response);
this.dataLoaderRegistry = dataLoaderRegistry;
}
// ... getter ...
}
then my DataFetchers do:
DataFetchingEnvironment environment = ...;
GraphQLRequestContext context = environment.getContext();
DataLoaderRegistry registry = context.getDataLoaderRegistry();
DataLoader<K, V> dataLoader = registry.getDataLoader(name);
return dataLoader.load(key);
@edacostacambioupgrade I found the servlet overly convoluted, so I normally advise a simple Spring controller, as it's a lot more obvious.
I'm just curious, since your DataLoaderRegistry
is already request scoped, you could directly inject it instead of keeping it in the context, right?
@kaqqao i didn't try but i'm not sure i can inject them in my datafetchers because (i think) the request scoped proxies are somehow bound by spring to the current thread, i suspect spring will not find the right loader if the fetching happens a different thread.
(i guess something like this answer would be needed, or i would need to decorate the tasks of the executors to pass that around anywhere where an async task/thread is fired, but i didn't like either solution)
yeah, maybe having my own controller would be better, but by briefly looking at the servlet i see it has a lot of stuff (multipart handling, callbacks, etc) and i don't know enough to tell if i will need that, but i don't want to reimplement them in my controller if i eventually do.
i think i will end up subclassing the SimpleGraphQLServlet servlet, where i can create a new instrumentation and create the context without having to use request-scoped beans.