reactor / reactor-netty

TCP/HTTP/UDP/QUIC client/server with Reactor over Netty

Home Page:https://projectreactor.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

high cardinality tags in micrometer metrics

grzegorz-moto opened this issue · comments

The metrics tags contains random id that caus high memory usage and leads to memory leak/exhausting. This is due the fact micrometer treats each new tag as new metric.
The example: https://github.com/reactor/reactor-netty/blob/main/reactor-netty-core/src/main/java/reactor/netty/resources/MicrometerPooledConnectionProviderMeterRegistrar.java#L55

@grzegorz-moto Please specify Reactor Netty version. Also why do you need so many connection pools and do you need all of them operational all the time?

version 1.0.25
the application is creating short leaving tcp connections to multiple targets - multiple of them at a time.
After some time in the heapdump there are tones of instances of io.micrometer.core.instrument.Meter$Id for

reactor.netty.connection.provider.active.connections
reactor.netty.connection.provider.total.connections
reactor.netty.connection.provider.max.connections
reactor.netty.connection.provider.pending.connections
reactor.netty.connection.provider.idle.connections
reactor.netty.connection.provider.max.pending.connections

where the only TAG ID makes it unique

@grzegorz-moto Update at least to version 1.0.26 where we deregister the metrics for the disposed connection pools. (better update to the latest one 1.0.34)
https://github.com/reactor/reactor-netty/releases/tag/v1.0.26
#2608

Then ensure you have configuration for disposeInactivePoolsInBackground, this will dispose all inactive connection pools and will deregister the metrics.
See more info for the configuration here https://projectreactor.io/docs/netty/release/reference/index.html#_connection_pool

where the only TAG ID makes it unique

This is interesting because the metrics have ID, Remote Address and Name, ideally you should not have different ID but the same Remote Address

I have ConnectionProvider and TcpClient configured in the following way:

private static final ConnectionProvider CONNECTION_PROVIDER = ConnectionProvider
            .builder("connection-provider-with-metrics")
            .metrics(true)
            .maxIdleTime(Duration.ofSeconds(60))
            .evictInBackground(Duration.ofSeconds(30))
            .disposeInactivePoolsInBackground(Duration.ofSeconds(30), Duration.ofSeconds(60))
            .build();

    private static final TcpClient TCP_CLIENT = TcpClient
            .create(CONNECTION_PROVIDER)
            .wiretap(true)
            .metrics(true)
            .option(ChannelOption.SO_KEEPALIVE, true)
            .option(EpollChannelOption.TCP_KEEPCNT, SipTransportTcpConnectionConfiguration.getTcpKeepCnt())
            .option(EpollChannelOption.TCP_KEEPIDLE, SipTransportTcpConnectionConfiguration.getTcpKeepIdle())
            .option(EpollChannelOption.TCP_KEEPINTVL, SipTransportTcpConnectionConfiguration.getTcpKeepIntvl())
            .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, CONNECT_TIMEOUT)
            .doOnConnect(bootstrap -> log.debug("Trying to connect to the remote host [{}]", NetworkLoggingUtility.createRemoteHostKeyValue(bootstrap.remoteAddress().get())));

    private Mono<? extends Connection> connect(InetSocketAddress socketAddress) {
        log.info("Establishing new TCP connection to [{}]", socketAddress);

        return TCP_CLIENT
                .remoteAddress(() -> socketAddress)
                .doOnConnected(this::addHandlers)
                .doOnConnected(tcpHandler::handleConnection)
                .doOnDisconnected(tcpHandler::handleDisconnected)
                .observe(tcpHandler.connectionObserver())
                .connect()
                .retryWhen(CONNECTION_RETRY_SPEC)
                .onErrorMap(e -> new TransportFailureException(TransportFailureException.FailureStatus.CONNECTION_FAILURE, e));
    }

and in onDisconnected() dispose() is being called on the connection. Without this manual dispose the connections seems leaking.
After all this changes I'm still observing the number of instances of class io.micrometer.core.instrument.Meter$Id is constantly growing

versions I'm using:
"io.projectreactor:reactor-bom:2022.0.9"
"io.netty:netty-bom:4.1.96.Final"

@grzegorz-moto Is the code below necessary to NOT be included into the common configuration?

                .doOnConnected(this::addHandlers)
                .doOnConnected(tcpHandler::handleConnection)
                .doOnDisconnected(tcpHandler::handleDisconnected)
                .observe(tcpHandler.connectionObserver())

@grzegorz-moto I'm closing this one, we can reopen it when you are able to reply to the comment above.