Supporting Incremental Delivery with @defer/@stream directives

Question

Supporting Incremental Delivery with @defer/@stream directives

potatosalad opened this issue 8 months ago · comments

I wanted to open an issue for discussion of incremental delivery with the @defer and @stream directives in Argo.

Following the specs in graphql/graphql-spec#742, which is now in the "Draft (RFC 2)" state, the following is a potential wire-type solution for dealing with these incremental responses:

schema {
  query: Query
}

type Query {
  root: Object!
}

type Object {
  children: [Object!]!
}

query {
  root {
    required: __typename
    ... @defer {
      deferred_x: __typename
    }
    ... @defer(label: "defer_z") {
      deferred_y: __typename
    }
    children_x: children @stream {
      streamed_x: __typename
    }
    children_y: children @stream(label: "stream_z") {
      streamed_y: __typename
    }
  }
}

The examples below use an ERROR wire type that is an alias to the following:

{
  message: STRING<String>
  location?: {
    line: VARINT<Int>
    column: VARINT<Int>
  }[]
  path?: PATH
  extensions?: DESC_OBJECT
}

In addition, a new wire type referred to as UNION (a tagged union type similar to the one found in BARE).

{
  data?: {
    root: {
      required: STRING<String>
      deferred_x?: STRING<String>
      deferred_y?: STRING<String>
      children_x: {
        streamed_x: STRING<String>
      }[]
      children_y: {
        streamed_y: STRING<String>
      }[]
    }
  }?
  incremental?: UNION {
    <0>: {
      path: PATH
      data: {
        deferred_x: STRING<String>
      }?
      errors?: ERROR[]?
      extensions?: DESC_OBJECT
    }
    <1:"defer_z">: {
      path: PATH
      data: {
        deferred_y: STRING<String>
      }?
      errors?: ERROR[]?
      extensions?: DESC_OBJECT
    }
    <2>: {
      path: PATH
      items: {
        streamed_x: STRING<String>
      }[]
      errors?: ERROR[]?
      extensions?: DESC_OBJECT
    }
    <3:"stream_z">: {
      path: PATH
      items: {
        streamed_y: STRING<String>
      }[]
      errors?: ERROR[]?
      extensions?: DESC_OBJECT
    }
  }[]
  hasNext?: BOOLEAN<Boolean>
  errors?: ERROR[]?
  extensions?: DESC_OBJECT
}

Any initial thoughts or opinions? Thanks!

Mike Solomon · Answer 1 · Sat Feb 10 2024 03:09:16 GMT+0800 (China Standard Time)

I think this makes a lot of sense. In Argo, we'd probably want to note that it's subject to change to whatever is eventually incorporated into the GrapQL spec.

Here are a variety of notes and thoughts:

It's a small bummer to introduce a new UNION type, but I think it's the best option. I think we'd use a VARINT tag for implementation simplicity, even though the negative numbers will be wasted (so using many different @defer/@stream in the same query will use a few extra bytes). One alternative to UNION would be to fake it in a synthetic RECORD with lots of fields (perhaps one RECORD for @stream and one for @defer) but that feels unnecessarily awkward to me, and the payloads would be unnecessarily large.
The items field should be nullable in @stream payloads (I see you already made data nullable for @defer payloads)
Inside the union, the path is fully knowable for @defer, and knowable except the final index for @stream. We could avoid encoding it (perhaps including only the index), or drop the known prefix as we do in Argo Field errors and add it back in after decoding. I'm inclined toward this last, for consistency's sake.
Inside the union, label (if any) is always known. We can avoid sending it over the wire or including it in the wire type, but make it available to users. (I think you had this in mind, guessing from your UNION syntax)
In incremental, I think extensions should be nullable (as well as omittable)
In ERROR, I think extensions, path, and location should be nullable (as well as omittable)
In implementations, this may result in in-memory wire types becoming pretty large due to the repetition of path, extension, and especially error field types. It's not a new problem for this sort of thing, but careful use of generics and other language/runtime features that help with sharing or parameterization will be useful where performance matters. I'd expect the JSON Wire schema serialization of these to get unwieldy quickly.

Thanks for looking into this!

Andrew Bennett · Answer 2 · Sat Feb 10 2024 03:55:44 GMT+0800 (China Standard Time)

It's a small bummer to introduce a new UNION type, but I think it's the best option. I think we'd use a VARINT tag for implementation simplicity, even though the negative numbers will be wasted (so using many different @defer/@stream in the same query will use a few extra bytes). One alternative to UNION would be to fake it in a synthetic RECORD with lots of fields (perhaps one RECORD for @stream and one for @defer) but that feels unnecessarily awkward to me, and the payloads would be unnecessarily large.

Yeah, I played around with a few different options here as well, but I thought a simple VARINT tagged UNION might be the simplest solution. The label representation <1:"defer_z"> would be internal to the wire-type only and useful for converting an Argo value back into the JSON representation where the {"label": "defer_z", "path": ..., "data": ...} would need to be inserted. For wire encoding/decoding, it would just be a normal VARINT(1).

The items field should be nullable in @stream payloads (I see you already made data nullable for @defer payloads)

Oh, yup, you're correct. I wrote the pseudo wire type by hand so there may be other accidental mistakes.

Inside the union, the path is fully knowable for @defer, and knowable except the final index for @stream. We could avoid encoding it (perhaps including only the index), or drop the known prefix as we do in Argo Field errors and add it back in after decoding. I'm inclined toward this last, for consistency's sake.

The path for @defer is only partially known prior to execution, consider the case where @defer occurs underneath an array:

schema {
    query: Query
}

type Query {
    x: X!
}

type X {
    ys: [Y!]!
}

type Y {
    z: Z!
}

type Z {
    name: String!
}

query {
    x {
        ys {
            z {
                __typename
                ... @defer {
                    name
                }
            }
        }
    }
}

The path for the @defer in this case would be ["x", "ys", VARINT, "z"] where VARINT will be a separate incremental path reply for each item under ys (the same can be said for nested cases of @stream).

Inside the union, label (if any) is always known. We can avoid sending it over the wire or including it in the wire type, but make it available to users. (I think you had this in mind, guessing from your UNION syntax)

Correct, it's primarily used for converting from and to the JSON representation, only the VARINT index is encoded/decoded for the tagged UNION.

In incremental, I think extensions should be nullable (as well as omittable)

There's some discussion in the PR about this:

The GraphQL server may determine there are no more values in the response stream after a previous value with hasNext equal to true has been emitted. In this case the last value in the response stream should be a map without data and incremental entries, and a hasNext entry with a value of false.

I don't think {"incremental": null} has any meaning the same way that {"data": null} does, but would instead always be something more like {"incremental": [{"path": ..., "data": null, "errors": [...]}]} instead.

At least that's my current understanding after reading through the specs. For extensions, see my comment below and let me know what you think.

In ERROR, I think extensions, path, and location should be nullable (as well as omittable)

Following the wording the Errors section of the GraphQL Spec:

If present, the errors entry in the response must contain at least one error. If no errors were raised during the request, the errors entry must not be present in the result.
…
Every error must contain an entry with the key message with a string description of the error intended for the developer as a guide to understand and correct the error.
…
GraphQL services may provide an additional entry to errors with key extensions. This entry, if set, must have a map as its value.

Nothing is explicitly stated about path and location, but I had interpreted it to have similar meaning to extensions where it either should be present and of a specific format, or otherwise omitted entirely.

This also seems to imply that the typing for errors?: ERROR[]? might be better represented as errors?: ERROR[].

What do you think?

In implementations, this may result in in-memory wire types becoming pretty large due to the repetition of path, extension, and especially error field types. It's not a new problem for this sort of thing, but careful use of generics and other language/runtime features that help with sharing or parameterization will be useful where performance matters. I'd expect the JSON Wire schema serialization of these to get unwieldy quickly.

Yeah, I thought about potentially introducing a RESPONSE wire type that might make it easier for implementations to (1) reference fragments of records and (2) make path validation more standardized. Something like:

{
    "type": "RESPONSE",
    "data": {
        "type": "RECORD",
        "fields": [...]
    },
    "incremental": [
        {
            "type": "DEFER",
            "index": 0,
            "data": ...
        },
        {
            "type": "STREAM",
            "index": 1,
            "item": ...
        }
    ]
}

Internally, it could expand to the full wire-type involving errors and extensions.

Fields underneath the data key for the RESPONSE could reference the incremental portions with something like {"type": "FRAGMENT", "index": 0} or similar.

This would also make it so the encoding for PATH could have its starting point underneath data, which matches how it's used in the JSON encoding.

Mike Solomon · Answer 3 · Sat Feb 10 2024 10:08:24 GMT+0800 (China Standard Time)

Yeah, I played around with a few different options here as well, but I thought a simple VARINT tagged UNION might be the simplest solution. The label representation <1:"defer_z"> would be internal to the wire-type only and useful for converting an Argo value back into the JSON representation where the {"label": "defer_z", "path": ..., "data": ...} would need to be inserted. For wire encoding/decoding, it would just be a normal VARINT(1).

Yeah, it would be handy to have the label value available. Instead of baking it into the union (which is probably the only place it will be used), I'm somewhat more inclined to introduce a type like CONST_STRING:

  incremental?: UNION {
    ...
    <1>: {
      label: CONST_STRING="defer_z"
      path: PATH
      data: {
        deferred_y: STRING<String>
      }?
      errors?: ERROR[]?
      extensions?: DESC_OBJECT
    }

Another small bummer, but clear enough. It would naturally extend to constants of other types, or even default values, but GraphQL has little or no need for these at the moment.

The path for @defer is only partially known prior to execution, consider the case where @defer occurs underneath an array:
...

Great explanation, thanks. The same probably applies to @stream as well. In that case, perhaps it's simplest to leave PATHs unmodified. Of course, the main alternative would be to truncate at the first list/index. I'm not sure it's worth the hassle.

In incremental, I think extensions should be nullable (as well as omittable)
...

What do you think?

Your reasoning makes sense to me, I had just checked what types I used in the reference implementation for ERROR. IIRC I wanted to support whatever JSON folks might have, but I like the stricter approach you take.

Yeah, I thought about potentially introducing a RESPONSE wire type ...

Nice. I think for now it's probably simplest to leave it up to implementations, and have the spec use the maximally-expanded version which everything must eventually be equivalent to.