reactor / reactor-netty

TCP/HTTP/UDP/QUIC client/server with Reactor over Netty

Home Page:https://projectreactor.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TraceId lost in Netty custom LoggingHandler READ event

syedyusufh opened this issue · comments

TraceId is lost only for the READ event log generated by the below custom LoggingHandler on Spring Boot v3.1.5 and Netty

@Component
public class CustomWebClientLogger extends LoggingHandler {

	public CustomWebClientLogger() {

		super(HttpClient.class, DEBUG, SIMPLE);
	}

	@Override
	protected String format(ChannelHandlerContext ctx, String event, Object arg) {

		var channel = ctx.channel();

		if (arg instanceof ByteBufHolder byteBufHolder && StringUtils.equalsAny(event, "READ", "WRITE")) {

			var msg = byteBufHolder.content();
			var logMsg = msg.toString(UTF_8);

			if ("WRITE".equals(event))
				return "DownStream Request: " + logMsg;

			if ("READ".equals(event)) {

				var channelId = channel.id().asLongText();

				AttributeKey<StringBuilder> readMsgAttrKey = AttributeKey.valueOf(channelId);
				var readMsgAttr = channel.attr(readMsgAttrKey);
				var readMsgAttrStrBldr = readMsgAttr.get();

				if (Objects.isNull(readMsgAttrStrBldr)) {
					readMsgAttrStrBldr = new StringBuilder(logMsg);
					readMsgAttr.set(readMsgAttrStrBldr);
				} else {
					readMsgAttrStrBldr.append(logMsg);
				}

				if (arg instanceof DefaultLastHttpContent || msg instanceof EmptyByteBuf)
					return "DownStream Response: " + readMsgAttr.getAndSet(null);
			}
		}

		return "";
	}
}

if you need to get the current tracing informations from your LoggingHandler when the http client writes the request or reads the response, then if I'm correct the context propagation library can be used, it will give you access to the current observation informations, including the tracing informations with the span id, etc ...
See https://projectreactor.io/docs/netty/release/reference/index.html#_tracing_4, 6.13.1. Access Current Observation section.

I'm attaching an example project with a custom LoggingHandler which uses the library, see server1/src/main/java/org/example/CustomWebClientLogger.java in example.tgz

To run the example:

  • you first need to install zipkin: docker run -d -p 9411:9411 openzipkin/zipkin
  • build the example project using java17: gradlew build
  • from one console, run the first server1, it will listen on 8080: java -jar server1/build/libs/server1-1.0.0.jar
  • from another console, run the second server2, it will listen on 8081: java -jar server2/build/libs/server2-1.0.0.jar
  • finally, from another console, do a curl command to the first server: curl -v http://localhost:8080/hello

the server1 will forward the request to the server2, and from server1, you will see the logs displayed by the CustomWebClientLogger(see server1/src/main/java/org/example/CustomWebClientLogger.java):

20:55:43.217 [reactor-http-nio-2] DEBUG r.netty.http.client.HttpClient - [id: 0x2e649259, L:/127.0.0.1:65285 - R:localhost/127.0.0.1:8081] 
WRITE: 196B TracingContext{span=654be7bf2ff7337fa993e8e0c70be2cc/9c73f41334694d61}
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
|00000000| 47 45 54 20 2f 68 65 6c 6c 6f 20 48 54 54 50 2f |GET /hello HTTP/|
|00000010| 31 2e 31 0d 0a 75 73 65 72 2d 61 67 65 6e 74 3a |1.1..user-agent:|
|00000020| 20 52 65 61 63 74 6f 72 4e 65 74 74 79 2f 31 2e | ReactorNetty/1.|
|00000030| 31 2e 31 33 2d 53 4e 41 50 53 48 4f 54 0d 0a 68 |1.13-SNAPSHOT..h|
|00000040| 6f 73 74 3a 20 6c 6f 63 61 6c 68 6f 73 74 3a 38 |ost: localhost:8|
|00000050| 30 38 31 0d 0a 61 63 63 65 70 74 3a 20 2a 2f 2a |081..accept: */*|
|00000060| 0d 0a 58 2d 42 33 2d 54 72 61 63 65 49 64 3a 20 |..X-B3-TraceId: |
|00000070| 36 35 34 62 65 37 62 66 32 66 66 37 33 33 37 66 |654be7bf2ff7337f|
|00000080| 61 39 39 33 65 38 65 30 63 37 30 62 65 32 63 63 |a993e8e0c70be2cc|
|00000090| 0d 0a 58 2d 42 33 2d 53 70 61 6e 49 64 3a 20 61 |..X-B3-SpanId: a|
|000000a0| 39 39 33 65 38 65 30 63 37 30 62 65 32 63 63 0d |993e8e0c70be2cc.|
|000000b0| 0a 58 2d 42 33 2d 53 61 6d 70 6c 65 64 3a 20 31 |.X-B3-Sampled: 1|
|000000c0| 0d 0a 0d 0a                                     |....            |
+--------+-------------------------------------------------+----------------+

20:55:43.222 [reactor-http-nio-2] DEBUG r.netty.http.client.HttpClient - [id: 0x2e649259, L:/127.0.0.1:65285 - R:localhost/127.0.0.1:8081] 
READ: 83B TracingContext{span=654be7bf2ff7337fa993e8e0c70be2cc/9c73f41334694d61}
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
|00000000| 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d |HTTP/1.1 200 OK.|
|00000010| 0a 43 6f 6e 74 65 6e 74 2d 54 79 70 65 3a 20 74 |.Content-Type: t|
|00000020| 65 78 74 2f 70 6c 61 69 6e 3b 63 68 61 72 73 65 |ext/plain;charse|
|00000030| 74 3d 55 54 46 2d 38 0d 0a 43 6f 6e 74 65 6e 74 |t=UTF-8..Content|
|00000040| 2d 4c 65 6e 67 74 68 3a 20 35 0d 0a 0d 0a 68 65 |-Length: 5....he|
|00000050| 6c 6c 6f                                        |llo             |
+--------+-------------------------------------------------+----------------+

let me know if this is what you are trying to achieve and if it works.

Thanks @pderop for the detailed response. In the sample shared, you are reading the tracing context and appending it to your log statements.

We are trying to do the same via MDC, meaning we expect the traceId & spanId to be available when the logging happens via logback. With Hooks.enableAutomaticContextPropagation() and Brave as the tracing library, we are able to get the traceId & spanId for the WRITE event. Context is missing only for the READ event :(

Please let me know if more information is required.

Can you provide a minimal reproducer sample project that I can build and test ? (maybe you can try to change the "server1" from the attached example).

thank you.

Sure, please check https://github.com/syedyusufh/webclient-missing-traceid.git
You should see the line "DownStream Response" without the traceId & spanId

P.S: Have included a lot of dependencies as per the original application as to see if that any is causing a conflict

Can you try this:

in your reproducer, in the CustomWebClientLogger class:

Add the following imports:

import brave.baggage.BaggageField;
import io.micrometer.context.ContextSnapshot;
import io.micrometer.observation.Observation;
import io.micrometer.tracing.handler.TracingObservationHandler;
import org.springframework.beans.factory.annotation.Autowired;

import static reactor.netty.Metrics.OBSERVATION_REGISTRY;

Inject the following fields:

	@Autowired
	private BaggageField customTraceIdField;

	@Autowired
	private BaggageField customSpanIdField;

Add this method:

	private void getTracingInfo(ChannelHandlerContext ctx) {
		try (ContextSnapshot.Scope scope = ContextSnapshot.setAllThreadLocalsFrom(ctx.channel())) {
			Observation obs = OBSERVATION_REGISTRY.getCurrentObservation();
			TracingObservationHandler.TracingContext tc = obs != null ? obs.getContextView().get(TracingObservationHandler.TracingContext.class) : null;
			if (tc != null) {
				var sampleCustomTraceId = "AppTraceId" + tc.getSpan().context().traceId();
				customTraceIdField.updateValue(sampleCustomTraceId);

				var sampleSpanId = "AppSpanId" + tc.getSpan().context().spanId();
				customSpanIdField.updateValue(sampleSpanId);
			}
		}
	}

And modify your format method like this: when logging a READ, then call getTracingInfo before proceeding with the formatting method:

	protected String format(ChannelHandlerContext ctx, String event, Object arg) {

		var channel = ctx.channel();

		if (arg instanceof ByteBufHolder byteBufHolder && StringUtils.equalsAny(event, "READ", "WRITE")) {

			var msg = byteBufHolder.content();
			var logMsg = msg.toString(StandardCharsets.UTF_8);

			if ("WRITE".equals(event))
				return "DownStream Request: " + logMsg;

			if ("READ".equals(event)) {
				getTracingInfo(ctx);
				var channelId = channel.id().asLongText();

				AttributeKey<StringBuilder> readMsgAttrKey = AttributeKey.valueOf(channelId);
				var readMsgAttr = channel.attr(readMsgAttrKey);
				var readMsgAttrStrBldr = readMsgAttr.get();

				if (Objects.isNull(readMsgAttrStrBldr)) {
					readMsgAttrStrBldr = new StringBuilder(logMsg);
					readMsgAttr.set(readMsgAttrStrBldr);
				} else {
					readMsgAttrStrBldr.append(logMsg);
				}

				if (arg instanceof DefaultLastHttpContent || msg instanceof EmptyByteBuf)
					return "DownStream Response: " + readMsgAttr.getAndSet(null);
			}
		}

		return "";
	}

Let me know ?

the above is a first attempt to try to go ahead, but it seems baggages should not be used, so I'm continuing to investigate ...

Thanks for the quick revert

I tried removing the baggageFIelds and used the normal traceId / spanId, still the same missing behavior for READ event alone.

Regarding the workaround, for now I have got it working by storing the traceId / spanId or baggageFields during WRITE event in one of Netty channel attribute and got the same set in MDC context during the READ event.

Does it sound like a bug?

Hi,

some updates:

please ignore the previous proposed fix from #2964 (comment)

instead, please consider looking into this one:
webclient-missing-traceid.tgz, where your original reproducer project has been adapted.

The following things have been modified:

  • the baggages are unnecessary, and they expose server observation, not the client observation, so BraveConfig, RequestAdvice, and ResponseAdvice has been removed from the original webclient-missing-traceid reproducer
  • in the application.yml, customTraceId and customSpanId have respectively been replaced by traceId, and spanId
  • finally, like documented in https://projectreactor.io/docs/netty/release/reference/index.html#_tracing_2 (4.11.1. Access Current Observation), the custom CustomWebClientLogger should use the ContextSnapshot.setAllThreadLocalsFrom(ctx.channel()call in order to restore thread local context stored in the channel. See the CustomWebClientLogger class which is now overriding the channelReadComplete, channelRead, and write methods in order to make sure thread locals are restored from the channel context

With the attached modified project, we can now see real tracing informations:

2023-11-13 16:41:40,897 [reactor-http-nio-3] 655243b4a6769be6835c8c78b233abcd|58e2812922fb36d5 DownStream Request: {"greeting":"Hello, Good Evening!"}

2023-11-13 16:41:41,001 [reactor-http-nio-3] 655243b4a6769be6835c8c78b233abcd|58e2812922fb36d5 DownStream Response: {
  "args": {},
  "data": {
    "greeting": "Hello, Good Evening!"
  },
  "files": {},
  "form": {},
  "headers": {
    "x-forwarded-proto": "https",
    "x-forwarded-port": "443",
    "host": "postman-echo.com",
    "x-amzn-trace-id": "Root=1-655243b4-171477e005b1bd0062c9898c",
    "content-length": "35",
    "user-agent": "ReactorNetty/1.1.12",
    "accept": "*/*",
    "content-type": "application/json",
    "b3": "655243b4a6769be6835c8c78b233abcd-58e2812922fb36d5-0"
  },
  "json": {
    "greeting": "Hello, Good Evening!"
  },
  "url": "https://postman-echo.com/post"
}

can you check the modified project ?

Works perfectly now, thanks a lot !!

Regarding the BaggageField, we have some application related fields that need to be available anywhere, BaggageField comes in handy to initialize once and use anywhere. Partly it is used for Tracing matching the consumer / client reference.

Can you please suggest if there is an alternative for the below deprecation?

ContextSnapshot.setAllThreadLocalsFrom(ctx.channel())

Sorry @chemicL , @pderop I overlooked the method doc.
All working fine, we can close this issue.