reactor / reactor-netty

TCP/HTTP/UDP/QUIC client/server with Reactor over Netty

Home Page:https://projectreactor.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Consistent Memory Increase in Webflux Application

aspOEDev opened this issue · comments

I am relatively new to reactor framework and I have created a new BFF layer service for our application integrating with 7 different downstream systems using Webflux but we are observing gradual memory increase in our pods memory consumption. In cases when there are timeouts or downstream failures the memory start spiking and does not comeback to normal until a restart of the pod is done.

Below are the versions we have used:

  1. Java - 17.0.2
  2. spring-boot-starter-webflux - 2.7.15
  3. spring-webflux - 5.3.29
  4. spring-cloud-starter-gateway - 3.1.4

Below is how I have initialized our webclient in a generic client service.

public ResponseSpec get(String url, HttpHeaders headers, int timeOutinMillis,
                            Function<UriBuilder, URI> uriFunction) {
        log.info("Building get request for {}, headers {}", url, headers);
        WebClient.Builder webClientBuilder = WebClient.builder();
        metricsWebClientCustomizer.customize(webClientBuilder);

        WebClient client = webClientBuilder.baseUrl(url).build();
        return client.get().uri(uriFunction).headers(h -> h.addAll(headers)).httpRequest(httpRequest -> {
            HttpClientRequest reactorRequest = httpRequest.getNativeRequest();
            reactorRequest.responseTimeout(Duration.ofMillis(timeOutinMillis));
        }).retrieve();
    }

    public ResponseSpec post(String url, HttpHeaders headers, Object body, int timeOutinMillis,
                             Function<UriBuilder, URI> uriFunction) {
        log.info("Building post request for {}, headers {}, body {}", url, headers, body);
        WebClient.Builder webClientBuilder = WebClient.builder();
        metricsWebClientCustomizer.customize(webClientBuilder);

        WebClient client = webClientBuilder.baseUrl(url).build();
        return client.post().uri(uriFunction).headers(h -> h.addAll(headers)).contentType(MediaType.APPLICATION_JSON)
                .bodyValue(body)
                .httpRequest(httpRequest -> {
                    HttpClientRequest reactorRequest = httpRequest.getNativeRequest();
                    reactorRequest.responseTimeout(Duration.ofMillis(timeOutinMillis));
                }).retrieve();
    }

Initially I was using the Autowired WebClient.Builder instance to initialise the client but with increase in load I observed calls in the same downstream client going to wrong apis resulting in mixing of calls. So I changed the approach to use WebClient.builder() to create new builder instance everytime as suggested on some blogs and that solved the wrong call issue. We also reduced the logs to improve memory consumption post heap dump analysis but I have only been able to make the service take longer time to crash.

This is how the memory trend looks like during service startup and then the curve becomes relatively flat but there is always a gradual increase until the service crashes:
image

We are running this using tomcat in Kubernetes.

Below are the current heap dump dominator tree screenshots
image

image

The primary suspect as per heap dump analysis is the following class:

image
One instance of org.apache.tomcat.util.collections.SynchronizedStack loaded by org.springframework.boot.loader.LaunchedURLClassLoader @ 0xad2b9908 occupies 2,07,15,496 (20.88%) bytes. The memory is accumulated in one instance of java.lang.Object[], loaded by <system class loader>, which occupies 2,07,15,464 (20.88%) bytes.

Thread java.lang.Thread @ 0xaf1017a0 http-nio-8074-Acceptor has a local variable or reference to org.apache.tomcat.util.net.NioEndpoint @ 0xad8c4318 which is on the shortest path to java.lang.Object[500] @ 0xb1cb1e38. The thread java.lang.Thread @ 0xaf1017a0 http-nio-8074-Acceptor keeps local variables with total size 408 (0.00%) bytes.

Stacktrace of issue causing thread in heap dump

http-nio-8074-Acceptor
  at sun.nio.ch.Net.accept(Ljava/io/FileDescriptor;Ljava/io/FileDescriptor;[Ljava/net/InetSocketAddress;)I (Net.java(Native Method))
  at sun.nio.ch.ServerSocketChannelImpl.implAccept(Ljava/io/FileDescriptor;Ljava/io/FileDescriptor;[Ljava/net/SocketAddress;)I (ServerSocketChannelImpl.java:425)
  at sun.nio.ch.ServerSocketChannelImpl.accept()Ljava/nio/channels/SocketChannel; (ServerSocketChannelImpl.java:391)
  at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept()Ljava/nio/channels/SocketChannel; (NioEndpoint.java:548)
  at org.apache.tomcat.util.net.NioEndpoint.serverSocketAccept()Ljava/lang/Object; (NioEndpoint.java:79)
  at org.apache.tomcat.util.net.Acceptor.run()V (Acceptor.java:129)
  at java.lang.Thread.run()V (Thread.java:833)

I tried reproducing this on local setup but I am not able to reproduce this issue. Any suggestions or guidance on where I can improve so as to improve the application performance would really be helpful.

@aspOEDev

Initially I was using the Autowired WebClient.Builder instance to initialise the client but with increase in load I observed calls in the same downstream client going to wrong apis resulting in mixing of calls. So I changed the approach to use WebClient.builder() to create new builder instance everytime as suggested on some blogs and that solved the wrong call issue. We also reduced the logs to improve memory consumption post heap dump analysis but I have only been able to make the service take longer time to crash.

Please clarify whether you create the WebClient for every request?

If yes then please check this: https://stackoverflow.com/questions/77715508/httpclient-recomendations

@aspOEDev The mentioned versions are quite old, please upgrade to the latest supported versions.

@violetagg thanks for the suggestions.

@aspOEDev

Initially I was using the Autowired WebClient.Builder instance to initialise the client but with increase in load I observed calls in the same downstream client going to wrong apis resulting in mixing of calls. So I changed the approach to use WebClient.builder() to create new builder instance everytime as suggested on some blogs and that solved the wrong call issue. We also reduced the logs to improve memory consumption post heap dump analysis but I have only been able to make the service take longer time to crash.

Please clarify whether you create the WebClient for every request?

If yes then please check this: https://stackoverflow.com/questions/77715508/httpclient-recomendations

Yes I am creating a new webclient and builder instance per request. In our initial iteration we had used a common autowired builder instance which resulted in api and request content getting mixed up and wrong calls being fired. Hence we went ahead with the safest approach of creating new instance per request. I understand we can cache client instance per host and reduce the memory footprint to some extent. Also will upgrade versions and validate.

Let me give it a try an get back to you.

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open.