high cardinality tags in micrometer metrics

Question

high cardinality tags in micrometer metrics

grzegorz-moto opened this issue a year ago · comments

The metrics tags contains random id that caus high memory usage and leads to memory leak/exhausting. This is due the fact micrometer treats each new tag as new metric.
The example: https://github.com/reactor/reactor-netty/blob/main/reactor-netty-core/src/main/java/reactor/netty/resources/MicrometerPooledConnectionProviderMeterRegistrar.java#L55

Violeta Georgieva · Answer 1 · Thu Aug 10 2023 23:17:56 GMT+0800 (China Standard Time)

@grzegorz-moto Please specify Reactor Netty version. Also why do you need so many connection pools and do you need all of them operational all the time?

grzegorz-moto · Answer 2 · Fri Aug 11 2023 01:32:48 GMT+0800 (China Standard Time)

version 1.0.25
the application is creating short leaving tcp connections to multiple targets - multiple of them at a time.
After some time in the heapdump there are tones of instances of io.micrometer.core.instrument.Meter$Id for

reactor.netty.connection.provider.active.connections
reactor.netty.connection.provider.total.connections
reactor.netty.connection.provider.max.connections
reactor.netty.connection.provider.pending.connections
reactor.netty.connection.provider.idle.connections
reactor.netty.connection.provider.max.pending.connections

where the only TAG ID makes it unique

Violeta Georgieva · Answer 3 · Fri Aug 11 2023 01:52:13 GMT+0800 (China Standard Time)

@grzegorz-moto Update at least to version 1.0.26 where we deregister the metrics for the disposed connection pools. (better update to the latest one 1.0.34)
https://github.com/reactor/reactor-netty/releases/tag/v1.0.26
#2608

Then ensure you have configuration for disposeInactivePoolsInBackground, this will dispose all inactive connection pools and will deregister the metrics.
See more info for the configuration here https://projectreactor.io/docs/netty/release/reference/index.html#_connection_pool

Violeta Georgieva · Answer 4 · Fri Aug 11 2023 01:54:08 GMT+0800 (China Standard Time)

where the only TAG ID makes it unique

This is interesting because the metrics have ID, Remote Address and Name, ideally you should not have different ID but the same Remote Address

grzegorz-moto · Answer 5 · Tue Aug 22 2023 19:54:38 GMT+0800 (China Standard Time)

I have ConnectionProvider and TcpClient configured in the following way:

private static final ConnectionProvider CONNECTION_PROVIDER = ConnectionProvider
            .builder("connection-provider-with-metrics")
            .metrics(true)
            .maxIdleTime(Duration.ofSeconds(60))
            .evictInBackground(Duration.ofSeconds(30))
            .disposeInactivePoolsInBackground(Duration.ofSeconds(30), Duration.ofSeconds(60))
            .build();

    private static final TcpClient TCP_CLIENT = TcpClient
            .create(CONNECTION_PROVIDER)
            .wiretap(true)
            .metrics(true)
            .option(ChannelOption.SO_KEEPALIVE, true)
            .option(EpollChannelOption.TCP_KEEPCNT, SipTransportTcpConnectionConfiguration.getTcpKeepCnt())
            .option(EpollChannelOption.TCP_KEEPIDLE, SipTransportTcpConnectionConfiguration.getTcpKeepIdle())
            .option(EpollChannelOption.TCP_KEEPINTVL, SipTransportTcpConnectionConfiguration.getTcpKeepIntvl())
            .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, CONNECT_TIMEOUT)
            .doOnConnect(bootstrap -> log.debug("Trying to connect to the remote host [{}]", NetworkLoggingUtility.createRemoteHostKeyValue(bootstrap.remoteAddress().get())));

    private Mono<? extends Connection> connect(InetSocketAddress socketAddress) {
        log.info("Establishing new TCP connection to [{}]", socketAddress);

        return TCP_CLIENT
                .remoteAddress(() -> socketAddress)
                .doOnConnected(this::addHandlers)
                .doOnConnected(tcpHandler::handleConnection)
                .doOnDisconnected(tcpHandler::handleDisconnected)
                .observe(tcpHandler.connectionObserver())
                .connect()
                .retryWhen(CONNECTION_RETRY_SPEC)
                .onErrorMap(e -> new TransportFailureException(TransportFailureException.FailureStatus.CONNECTION_FAILURE, e));
    }

and in onDisconnected() dispose() is being called on the connection. Without this manual dispose the connections seems leaking.
After all this changes I'm still observing the number of instances of class io.micrometer.core.instrument.Meter$Id is constantly growing

versions I'm using:
"io.projectreactor:reactor-bom:2022.0.9"
"io.netty:netty-bom:4.1.96.Final"

Violeta Georgieva · Answer 6 · Thu Aug 31 2023 01:10:45 GMT+0800 (China Standard Time)

@grzegorz-moto Is the code below necessary to NOT be included into the common configuration?

                .doOnConnected(this::addHandlers)
                .doOnConnected(tcpHandler::handleConnection)
                .doOnDisconnected(tcpHandler::handleDisconnected)
                .observe(tcpHandler.connectionObserver())

Violeta Georgieva · Answer 7 · Mon Sep 11 2023 14:42:18 GMT+0800 (China Standard Time)

@grzegorz-moto I'm closing this one, we can reopen it when you are able to reply to the comment above.