mercurius-js / mercurius

Implement GraphQL servers and gateways with Fastify

Home Page:https://mercurius.dev/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OTEL instrumentation without routes: false

jasonkuhrt opened this issue · comments

Hi, this issue might fall into a one of the following categories, I'm not sure yet:

  • An issue with mercurius
  • An issue with fastify
  • An issue with our OTEL approach

We've instrumented our fastify app using an approach based on wrapping each route handler with OTEL context setup per request, e.g.:

const onRoute: onRouteHookHandler = (routeOptions) => {
    const handler = routeOptions.handler.bind(fastify)
    routeOptions.handler = (request, ...args) => {
      const requestContext = getContext(request)
      return Otel.api.context.with(requestContext, () => {          // <--- this
        return handler(request, ...args)
      })
    }
  }

Now, Mercurius ships with default routes which is great. However we need, like most apps I guess, to have authentication. So we do that by using fastify pre-handler hook e.g.:

fastify.addHook(`preHandler`, async (request, reply) => {
    if (request.routerPath !== `/graphql`) return

    Otel.api.trace.getActiveSpan()?.setAttributes({
      'user.id': `unknown`,
    })
    // ... authentication
})

But there's a problem here (which is obvious in hindsight), which is that the onRoute OTEL wrapping that we're doing is not going to apply to the preHandler execution context since it is not run in an OTEL context, since the wrapping code could not possible know which route preHandler hook is supposed to apply to (since its not declarative).

Consequences of losing observability include:

  • span duration is inaccurate
  • span attributes like user.id are missing

So how can this be solved?

My solution right now is to use mercurius with routes: false so that I can reimplement the routes with OTEL context.

I appreciate that there is app.graphql.* methods that will support us doing this, and I've looked at the https://github.com/mercurius-js/mercurius/blob/master/lib/routes.js module.

I think though, and this is where maybe there is a Mercurius issue, that there could be a middle ground where the routes can be augmented instead of having to be re-written.

I realize the suggestion is probably to use the hooks of Fastify, but, again, this seems to be fundamentally incompatible with OTEL? I would be happy to be wrong.

I am aware of fastify-otel npm package and our code was adapted from/inspired by theirs.

Thanks!

I'm pretty sure there is something that is incompatible between how you want to structure your code, Fastify, Mercurius and tracing.

The biggest cose smell is that you are defining a top level hook that you are skipping for one route. You should not be doing that, but rely on encapsulation.

The hooks system was designed to support tracing with minimum effort. You can see from https://github.com/open-telemetry/opentelemetry-js-contrib/blob/main/plugins/node/opentelemetry-instrumentation-fastify/src/instrumentation.ts that very little needs to be done.

The bad news is that's is pretty hard to tell unless I can see a code example.

The biggest cose smell is that you are defining a top level hook that you are skipping for one route. You should not be doing that, but rely on encapsulation.

What is the proper way to add authentication for the /graphql endpoint in Mercurius?

The hooks system was designed to support tracing with minimum effort. You can see from open-telemetry/opentelemetry-js-contrib@main/plugins/node/opentelemetry-instrumentation-fastify/src/instrumentation.ts that very little needs to be done.

We referenced this implementation https://github.com/autotelic/fastify-opentelemetry/blob/main/index.js#L140. The official one looks more complex to me.

The biggest cose smell is that you are defining a top level hook that you are skipping for one route. You should not be doing that, but rely on encapsulation.

What is the proper way to add authentication for the /graphql endpoint in Mercurius?

The same way you do. But I would not add your other routes on the top level Fastify instance, but use one encapsulated plugin for graphql and one for the rest.

The hooks system was designed to support tracing with minimum effort. You can see from open-telemetry/opentelemetry-js-contrib@main/plugins/node/opentelemetry-instrumentation-fastify/src/instrumentation.ts that very little needs to be done.

We referenced this implementation https://github.com/autotelic/fastify-opentelemetry/blob/main/index.js#L140. The official one looks more complex to me.

This is incorrect if you want to achieve automatic instrumentation (like you want to do) and cover all hooks.

Things that would help is an example to reproduce your problem. I have the feeling that something is missing in your codebase to make it all work.


But there's a problem here (which is obvious in hindsight), which is that the onRoute OTEL wrapping that we're doing is not going to apply to the preHandler execution context since it is not run in an OTEL context, since the wrapping code could not possible know which route preHandler hook is supposed to apply to (since its not declarative).

Why is it not going to apply to the mercurius routes?

Probably the next step is to carefully review the cited OTEL instrumentation code for its features/differences from the one we initially referenced. I did notice its wrapping more hooks than we are!l

Our current approach is to wrap every route handler with OTEL. As you point out that will miss things out that work with other hooks.

We're moving toward ESM which is incompatible currently (last I checked) with OTEL auto instrumentation. Maybe though we'll revisit this in the future!

If in continuing next steps we find a real design issue we'll try to return with that problem here. But as it stands it looks like our issue is related to how we're instrumenting.

Thanks @mcollina!