apollographql / subscriptions-transport-ws

:arrows_clockwise: A WebSocket client + server for GraphQL subscriptions

Home Page:https://www.npmjs.com/package/subscriptions-transport-ws

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Subscription activation confirmation message

prokher opened this issue · comments

Dear Colleagues!

I am trying to implement WebSocket-based GraphQL server in Python. I took your protocol description as a spec for GraphQL over WebSocket protocol. It works quite nice (especially since we use Apollo client on the frontend), thank you for this description.

One thing that bothers me, is that WebSocket client does not receive the confirmation that subscription is activated. I mean, since different requests can be processed asynchronously on the backend side it is sometimes necessary for client to wait until subscription is activated before proceeding further.

Consider client is starting a job (some long backend process) by invoking a mutation "StartJob". In order to receive notification about job progress, the client subscribes to the "OnJobUpdate" subscription just before running "StartJob". And the race condition occurs, cause it is unknown which of these two requests is processed first. Client can wait for query and mutation to finish (for data and complete messages), but it looks like there is not message that confirms that subscription request were processed.

How do you handle such an issue in your WebSocket transport?

UPDATE: When HTTP is used client can always wait for HTTP response, but with WebSocket there is not mandatory response event, so the problem occurs only when WebSocket transport is used.

Dear @mistic and @dotansimha , sorry for bothering, but since you are the authors of the PROTOCOL.md probably you can comment on this issue?

@prokher thanks for the comments on our previous protocol work. When we wrote this protocol, a long time ago, we design it to be simple and to save network traffic, and in my past usage, the protocol was able to cover every situation I need it for. Also the GraphCool guys are using it since then and I didn't heard about problems.

For example, in the situation you mentioned, don't you want to use the mutation GQL_DATA to get the data you need (instead of using the subscription for it)? Then the subscription can be used for further updates from other 'users' in the system that also runs the mutation you mentioned. The only thing you only need to take care about is to ignore the subscription you'll get if it was already processed by the server, but only on the the current client from where you're sending the mutation (you can use a lot of strategies for it), because in other logged in clients for the same user (for example mobile) you will need the subscription to update the data, if this is the case.

It's worth to mention that in all of my previous projects where I have used the apollo-client with this protocol, I have always the need to developed a top up rxjs extra plugin to deal better with apollo-client, subscroptions-transport-ws and the boilerplate of sending and receiving messages. At the time we were using the apollo-client@1.X.X and I found it to be a missing feature on the apollo-client. I don't know how the things are going right now!

I hope it helps!

@mistic Thank you for the thoughtful reply! Indeed you've done a great job with the protocol design. As a result, the protocol you described turned out to be simple and powerful at the same time.

What concerns the possible solution you mentioned, I do not understand how using mutation's GQL_DATA may help. In the case I am describing, I am trying to use subscriptions to inform the client about the progress of some long-running backend process.

Let me illustrate it by a particular example. Consider the a file copying process. I am trying to implement it as follows.

  1. Client generates unique job_id (GUID).
  2. Client subscribes to the progress messages of the process with job_id (since process is not started yet, clients hopes to receive all of them):
subscribe op1 {
    job(job_id: ...) {
        readiness
        status
        result
    }
}
  1. Client initiates a copy file process telling that the process identifies as job_id:
mutation op2 {
    cp(from: ..., to: ..., job_id: ...) {
        status
        job_id
    }
}
  1. Client is receiving subscription notifications until the process finishes.

The problems is if the process finishes very fast then client does not receive a single subscription notification, because server processes subscription request (step 2) and mutation request (step 3) asynchronously and in some cases mutation is processes before the subscription "activates".

The same problem occurs when a client sends several consecutive mutations to the server, but with mutations the client is able to wait the mutation to complete before sending the next one (client can wait for GQL_DATA/GQL_COMPLETE messages). Unfortunately, there is no messages to wait to be sure the subscription is activated.

Any thoughts? I could extend the protocol by adding a subscription confirmation message, but I am not sure it is a proper way to go. Moreover on the client side we use Apollo GraphQL client which implements this particular protocol, so if I change it, I would also need to modify the client accordingly. Again, it can be done, but I am not sure if it is correct. That is why I am trying to ignite the discussion here.

I would just like to share my use case which I am trying to seek a solution to. I have a series of integration tests which support real-time multi-user editing. As part of this, I need to verify that all mutations produce the correct event using the subscription.

The challenge I face, is since my client doesn't know when the subscription is active, it can sometimes start the mutations and as such, ends up missing some notifications. I'm getting intermittent failures in my test cases and the only solution I have so far is to introduce a sleep and hope for the best.

Just for information. In the WebSocket GraphQL server I am working on, I had to add an option to issue an extra DATA-message ({'data':null}) which confirms subscription activation, so client can wait for it. Probably, it could be useful for this project as well.

commented

I commend the simplicity of the protocol. In my view, this simplicity should also cover the use case mentioned above and the fact that the current protocol lacks support for it is a serious limitation. @prokher's workaround kind of works if you assume the client will gracefully handle the {'data':null} response and you have control over it. But there is a need for a more principled solution at the protocol level because:

  • Server implementors typically have no control over clients
  • WebSocket-based GraphQL subscriptions are becoming common
  • Out-of-order execution/asynchronous responses are a prime feature of many GraphQL backend libraries

Have we considered adding a server->client message such as the one below?

GQL_SUBSCRIPTION_ACK

The server responds clients with a GQL_SUBSCRIPTION_ACK to any GraphQL subscription initiated by the client, indicating that the subscription is active. May optionally include a payload.

For backwards compatibility purposes, If a GQL_DATA is received before GQL_SUBSCRIPTION_ACK, then the client can assume that the subscription was acknowledged.

A key part of adding this new message is dealing with protocol compatibility. For this approach to work, we need some kind of versioning built into the protocol so that the client never waits for the subscription acknowledgement if the server doesn't implement this new endpoint (just like it works today).

In addition to that, I also added a backwards compatibility comment on the GQL_SUBSCRIPTION_ACK specification so that when a client doesn't implement the new protocol message but the server does. In this scenario though, we can also use the versioning information in the protocol to help resolve the situation. It just merely gives us a bit more fault-tolerance.

If we all agree this is a desired approach, I'm happy to make a pull request adding this new protocol message. What do you folks think about this solution?

This project is not being actively developed at this time. You may want to look into the graphql-ws project.

commented

@glasser Thanks for the heads up. Do you mean enisdenjo/graphql-ws? I see there's another graphql-ws project under the Python organization so it isn't clear.

Is there any place where I can learn more about why this project is not actively developed in? Any way I can chime in on the maintenance front and help move this forward?

Yes, that's the project I mean. Note that it uses a different protocol so you have to change your clients if you switch to it. That protocol might fix this issue!

This project was started at Apollo about 5 years ago but never really got fully integrated across the Apollo platform. We are hoping to come back to subscriptions sooner rather than later and make them a fully supported part of the Apollo platform!