elastic / apm-agent-java

Home Page:https://www.elastic.co/guide/en/apm/agent/java/current/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Spring Webflux 3.2: agent attachment causes java.lang.IllegalStateException: The underlying HTTP client completed without emitting a response.

EvgeniGordeev opened this issue · comments

Describe the bug

Using Spring Boot 3.2.2 Webflux with attached APM agent 1.49.0 on JDK 21.0.3:

java.lang.IllegalStateException: The underlying HTTP client completed without emitting a response.
	at org.springframework.web.reactive.function.client.DefaultWebClient.lambda$static$0(DefaultWebClient.java:78) ~[spring-webflux-6.1.3.jar!/:6.1.3]
	at reactor.core.publisher.MonoErrorSupplied.subscribe(MonoErrorSupplied.java:55) ~[reactor-core-3.6.2.jar!/:3.6.2]
	at reactor.core.publisher.Mono.subscribe(Mono.java:4512) ~[reactor-core-3.6.2.jar!/:3.6.2]
	at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onComplete(FluxSwitchIfEmpty.java:82) ~[reactor-core-3.6.2.jar!/:3.6.2]
	at reactor.core.publisher.FluxOnAssembly$OnAssemblySubscriber.onComplete(FluxOnAssembly.java:549) ~[reactor-core-3.6.2.jar!/:3.6.2]
	at co.elastic.apm.agent.reactor.TracedSubscriber.onComplete(TracedSubscriber.java:141) ~[na:na]

Steps to reproduce

We don't have a sample test app yet but our findings concluded that the only factor leading to this error is agent jar attachment to java command. Without attachment the application works. With APM enabled it only fails on Spring Boot 3.2.2 but works fine on Spring Boot 2.7.10.

Another finding - the application's endpoints are functioning except for ones using Mono.zip

Expected behavior

Spring Webflux endpoint works the same way with or without APM.

Debug logs

See logs attached
apm.log

We've seen a few things like this with webflux, usually because the agent slightly changes the app timing which exposes an existing race condition rather than caused from a bug in the agent. We don't rule out bug in the agent, but here all that we've done is wrapped the subscriber so we can just note the entry and exit of the subscriber call, then delegate the call (here the onComplete) back to the application. If it's an underlying app race condition, you'll have difficulty reproducing it outside the app. It might be worth revisiting why the exception might occur outside of looking at the agent.

In any case, I can't see a bug in the agent with the given details, so without a reproducible test, we're can't really progress