mercurius-js / mercurius

Implement GraphQL servers and gateways with Fastify

Home Page:https://mercurius.dev/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Subscriptions in the federated service are not disposed correctly in gateway mode

hizo opened this issue · comments

GraphQL subscriptions are not terminated correctly when there is gateway in the chain. Works fine when client connects directly to a backend service.

I have created a minimalistic reproduction with a service, gateway and frontend https://github.com/hizo/mercurius-gateway-subscriptions-repro . You can find all necessary information in the readme.

Mercurius version: 11.5.0
Note: I have also tried with latest v12 version and @mercuriusjs/gateway and @mercuriusjs/federation packages - the issue is still there, but to debug I needed to insert logs into the mercurius dependency of each gateway and federation packages.

Actual result:
While it seems subscriptions are closed in gateway when there is a variable change, it's not promoted to the federated service and a subscription callback is executed as many times as variables changed.

Expected result:
Subscription termination is propagated to the federated service and subscription callback is triggered only for the active subscription.

Thanks for reporting! Would you like to send a Pull Request to address this issue? Remember to add unit tests.

@mcollina I'm afraid the reproduction is as far as I can get at the moment as we are just before a first release of a client project where this issue was noticed.

I think this is something that we (specifically @marco-ippolito) came across in the past when extracting subscription-client from this repo.

If I remember correctly what we noticed was this:

  • as the extraction had considerable risks of breaking changes, we tested the extracted subscription client in a "black box" fashion, meaning that ran a ws client against it. We did spot a strange behavior which seems to correspond to what @hizo is describing in this issue. @marco-ippolito initially captured it in this comment
  • on the other hand, we tested the same against the original version of mercurius with the still-embedded subscription-client, and noticed the same behavior
  • hence, we thought we'd caught a red herring and we were content that, although that behavior was somewhat weird, maybe we were missing something an in any case the before-after behavior was the same, so we were confident we'd not broken anything

This report makes me think that there was, and there still is, an issue in the way that the gateway handles subscriptions

I don't think this is a new bug either, it's something that was present in the previous codebase as well.

@RomuloVitoi is working on this

@hizo @mcollina

What we're missing here is a step when we receive the complete message in the gateway, the message is never forwarded to the service so it can close the subscription. I've tested by sending a message through the subscription client on a onSubscriptionEnd hook that was added to the gateway and it worked.

The issue is that we're using different subscription ids on the gateway and the service, the gateway uses the id that comes from the query and the id that is sent to the service is the operation id that is a counter from the subscription client (https://github.com/mercurius-js/mercurius-subscription-client/blob/main/lib/subscription-client.js#L296).

I've made some changes so the onSubscriptionEnd receives the subscription id and implemented a subscription map so I could map the subscription id to the operation id. The changes were based on
11.5.0 just because it was easier to test it in a single package: RomuloVitoi@0a8ab52

This is the hook that was added to the gateway:

app.graphql.addHook('onSubscriptionEnd', async (context, data) => {
  const { service, subscriptionId } = context.subscriptionMap.get(data.id)
  service.client.unsubscribe(subscriptionId)
});

Another approach instead of creating a subscription map would be using the subscription id as the operation id on the subscription client, that will make the subscription ids to be the same on both gateway and client but requires breaking changes on the subscription client.

Let me know what you think is the best approach.

I'd rather change subscription client than add a new hook

@marco-ippolito it is not adding new hooks just adding a new parameter to the existing onSubscriptionEnd, I've used the hook in this case because it was the easiest way to integrate my code with the gateway, but ideally this should live in https://github.com/mercurius-js/mercurius-gateway and not involve hooks.

I've made some changes so the onSubscriptionEnd receives the subscription id and implemented a subscription map so I could map the subscription id to the operation id.

I think this is the most sensible approach.

Another approach instead of creating a subscription map would be using the subscription id as the operation id on the subscription client, that will make the subscription ids to be the same on both gateway and client but requires breaking changes on the subscription client.

I don't understand how breaking this would be. I'm fine if the breakage is constrained within the subscription-client itself.

@mcollina any concerns about the subscriptions map within the gateway having a possibly unbounded growth? I assume it will contain all mappings for all subscriptions for all connections to the gateway, so potentially a very large in-memory data structure.

@simoneb that was my first thought as well

I'm not 100% convinced we could do without. Maybe let's give the change in subscriptionclient a shot?

I think before jumping to a solution we should check out how others do it. I recall that Apollo hasn't had (maybe still doesn't have) subscriptions support in federation. A quick search turned up this article: https://www.apollographql.com/blog/backend/federation/using-subscriptions-with-your-federated-data-graph/.

Let's maybe spend some time finding inspiration in existing solutions (or lack thereof). @RomuloVitoi ok for you?

I don't understand how breaking this would be. I'm fine if the breakage is constrained within the subscription-client itself.

My idea was to make the createSubscription method receive the subscriptionId as a parameter and replace the operationId parameter from unsubscribe with the subscriptionId, but now I see that the client would still need a map for subscriptionId -> operationId because we don't want to change the behavior of having only a single active subscription per query-parameters combination on a service.

This is what I suspected, I think we should keep the map in this module then.

Let's go with the map then @RomuloVitoi . A couple of things which would be good to have a clearer understanding of:

  • have you figured out anything interesting from the Apollo blog post about subscriptions in federation?
  • is it worth considering an entirely different approach for subscriptions in federation which relies on a new module/service we'd build, rather than extending the existing modules?
  • did you figure out if subscriptions in federation is something that Apollo still don't support natively?
  • if we do go with the map approach, since the in-memory approach doesn't scale, we should probably consider implementing a pluggable API for that map, one which can easily be implemented on top of redis for example

@simoneb Apollo don't support it natively, that blog post contained an example of an implementation of a separate service that handles subscriptions that the gateway cannot. The client connects to the gateway over HTTP and the subscription service over WS, so it connects to two different services. The subscription service needs to subscribe to every topic that your service will publish on a redis instance and handle every topic by querying the gateway, the graphql service also needs to be modified to connect and publish to that redis instance instead of the regular pubsub that it usually does, it is not as plug-and-play as the mercurius implementation.

I also think the option to use a different storage for the mappings should exist for scalability but currently the SubscriptionClient already depends on in-memory storage for the operations map, we could tackle both later in a different issue.

Thanks @RomuloVitoi, let's get rolling with this then 👌

@hizo the PRs were merged and released, would you mind checking again on gateway v1.2.0 and mercurius v12.1.0?