TlsMetricsHandler throws NPE when used together with SniHandler
AndreasKasparek opened this issue · comments
We have a Spring cloud gateway application using Webflux that we recently updated from org.springframework.cloud version 2022.0.4 to 2023.0.0.
In our bean configuration we set a NettyServerCustomizer
that registers a NoOp (all callback methods are empty) ChannelMetricsRecorder
because we are not interested in the connection metrics but in the ByteBuf allocation metrics that we couldn't enable otherwise.
@Bean
public WebServerFactoryCustomizer<NettyReactiveWebServerFactory> customizeNettyServerFactory() {
// noopRecord is an empty ChannelMetricsRecorder
return factory -> {
factory.addServerCustomizers(server -> server.metrics(true, () -> noopRecorder));
};
}
With the previous Spring cloud gateway version this code worked fine, presumably because no SniProvider was created, but now it leads to an error when trying to establish a SSL connection.
Expected Behavior
No error when opening an SSL connection to the application (as server).
Actual Behavior
It seems that since the Spring update the org.springframework.boot.web.embedded.netty.SslServerCustomizer
(spring-boot:3.2.0) now always calls reactor.netty.tcp.SslProvider.Builder#setSniAsyncMappings
. This has the effect that the reactor.netty.tcp.SslProvider
class will create a reactor.netty.tcp.SniProvider
when constructed via builder.
When enabling channel metrics via the above mentioned server customizer, Netty automatically registers a TlsMetricsHandler
within the AbstractChannelMetricsHandler
. This TLS metrics handler provokes a null pointer exception when an SSL connection should be established with the server.
Even so the change that triggered this is probably in the Spring code, I assume that the root cause is actually a bug in reactor-netty-core (see next section).
Steps to Reproduce
When an SniProvider exists, the SslProvider#addSslHandler
method will delegate the work to the SniProvider by calling SniProvider#addSniHandler
. And in that method a new SniHandler
instance is add to the pipeline via:
pipeline.addFirst(NettyPipeline.SslHandler, newSniHandler()); // Please note that the first argument is a string constant.
The AbstractChannelMetricsHandler#channelRegister
method checks if the pipeline contains an ssl handler by doing a lookup by name and if yes it registers a TLS metrics handler:
if (ctx.pipeline().get(NettyPipeline.SslHandler) != null) { // Note that this is again using the string constant!
ctx.pipeline()
.addBefore(NettyPipeline.SslHandler, // this too
NettyPipeline.TlsMetricsHandler, tlsMetricsHandler());
}
The TlsMetricsHandler#channelActive
method however asks the pipeline for an SslHandler
class by type instead of using the NettyPipeline.SslHandler
name. An SniHandler
however is not an instance of SslHandler
, meaning the pipeline does not contain any matching class and therefore returns null, which leads to the exception:
ctx.pipeline()
.get(SslHandler.class) // returns null
.handshakeFuture()
...
Possible Solution
I don't know what the intention was, but either the SniHandler has to implement a common interface together with the SslHandler to support handshakeFuture() that is needed by the TlsMetricsHandler, or the metrics handler should be registered only if a handler of type SslHandler is part of the pipeline but not for SniHandlers (using lookup by type instead of name).
- Reactor version(s) used: reactor-core 3.6.0
- Other relevant libraries versions (eg.
netty
, ...): reactor-netty-core 1.1.13, spring-boot 3.2.0 - JVM version (
java -version
): Corretto-17.0.7.7.1
@AndreasKasparek Thanks for the detailed explanation! This should be fixed with #3023
Thank you very much @violetagg, that was very fast!