micrometer-metrics / context-propagation

Context Propagation API

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

traceId and spanId missing in access logs

oemergenc opened this issue · comments

Hi,

I am not sure wether this is a problem in the context-propagation or somewhere else, so I am trying my luck here. I am trying to set up a very simple spring cloud gateway application and I want to log traceIds and spanId. I was able to create a simple example which seems to work, although I am not sure wether I have configured context propagation the right way. Anyway, there is one test failing due to the fact that the traceId and spanId are missing from one request and I cannot understand why. Please have a look at this repo:

https://github.com/oemergenc/spring-cloud-gateway-upgrade

I have added a single integration test which is success for one request and fails for another and I have no idea why. The test is here:
https://github.com/oemergenc/spring-cloud-gateway-upgrade/blob/a79b683f7d6dd4194cdf21029c256471b550f915/src/test/java/com/gateway/ItTest.java#L13

Can anyone help me with that?

It would also be nice, if someone can check wether my configuration here:
https://github.com/oemergenc/spring-cloud-gateway-upgrade/blob/e3045ee2137d56b0fe0f7751409e48f39bce7f74/src/main/java/com/gateway/configuration/TracingConfiguration.java#L27
is correct to have working traceId and spanId propagation?

Thanks in advance.

Hi, any help on this would be really great. May I ask this question somewhere else?

I'm not an expert of Spring Cloud Gateway or Reactor-Netty, but I think this could be an issue with the instrumentation of one of them. The tracing information might be missing from those logs since the code that does the logging is not "in-scope" (does not have access to the tracing information/context).

Could you please try to improve on the reproducer a bit?

  • Is it minimal? Could you please clean it up and remove as many things as possible?
    • E.g.: grpc, gcp, Logbook, sleuth properties, io.micrometer:context-propagation dependency
    • Why do you need reactor-netty-http-brave, reactor-netty should be instrumented with the Observation API so Brave support should be in-place without this I think.
    • Hooks.enableAutomaticContextPropagation(); should be in the first line in your main method
    • I'm not sure registering ThreadLocalAccessors and Reactor-Netty components are needed anymore (the reactor-netty docs can tell).
    • I think a ContextSnapshotFactory bean is not necessary either.
    • Is TracingChannelDuplexHandler also necessary, I feel Reactor-Netty instrumentation should do this.

/cc @violetagg Should we move this issue into Reactor-Netty instead?

Let me first take a look.

  • Definitely reactor-netty-http-brave is not needed.
  • Is TracingChannelDuplexHandler also necessary, I feel Reactor-Netty instrumentation should do this.

No Reactor Netty is not gonna provide this.

  • there is no Reactor Netty tracing activated -> so something with SCG instrumentation.

More info reactor/reactor-netty#3003

@oemergenc You need to enable the tracing for Reactor Netty. The code below is just a sample. Please replace the Function.identity() with an appropriate function (check the javadoc to understand why it is important to provide this function https://projectreactor.io/docs/netty/release/api/reactor/netty/http/server/HttpServer.html#metrics-boolean-java.util.function.Function-):

  @Bean
  public NettyServerCustomizer defaultNettyServerCustomizer() {
    return server ->
        server
            .accessLog(true)
            .metrics(true, Function.identity())
            .doOnConnection(
                connection -> {
                  connection.addHandlerLast(tracingChannelDuplexHandler());
                });
  }

Hi @violetagg @jonatan-ivanov ,

thanks for your participation. I cleaned the repo a little bit up and tried to be as minimal as possible:I

  • I removed reactor-netty-http-brave
  • I was able to remove some of the code in the TracingConfiguration class.
  • I also enabled this .metrics(true, Function.identity()), for now I am using Function.identity() in this test.

All of this fixed the failing test. Thank you for your help.

Unfortunately I still need to say that I find the needed configuration very hard to understand. All of this is needed just to have traceIds and spanIds working. It feels like something will break with the next minor or patch update of any involved library.

Is this really the simplest way and most minimal way of getting tracing to work in reactor netty stack?
Why do I need to create the TracingChannelDuplexHandler with a ContextSnapshotFactory on my own?

Why do I need to create the TracingChannelDuplexHandler with a ContextSnapshotFactory on my own?

You need to manage this by yourself because it is too expensive to enable tracing for every possible event that Netty may send. You can decide which events are most important for your use case and enable the tracing only for those.

Since it seems it's working now and this issue has not much to do with the context-propagation project, let me close this issue.
@oemergenc If you feel that docs could be improved around this, please feel free to open an issue (or better: a PR) for Reactor Netty.
Maybe another one for Spring Boot to auto-configure this and add a property to enable/disable access logs and instrumentation, that way you might be able to get away by flipping properties.