netty / netty-tcnative

A fork of Apache Tomcat Native, based on finagle-native

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Native state can be corrupted when using netty in multiple classloaders

SgtSilvio opened this issue · comments

We have a setup with multiple classloaders that are used to isolate/hotreload extensions/plugins from the application:

  • Classloader A, parent, used for the application
    • Loads netty (netty-handler/buffer/...)
    • Loads netty-tcnative (netty-tcnative-classes/boringssl-static...)
  • Classloader B, child of A, used for an extension/plugin of the application
    • Loads netty (netty-handler/buffer/...)

We did not expect the extension/plugin in classloader B to use tcnative and thought it uses the JDK SSL.
But tcnative is actually used because classloader B finds tcnative via its parent classloader A.
So far this would not be a problem, if it would not result in the extension/plugin corrupting tcnative for the application. This manifests in failing TLSv1.2 handshakes.
We found the root cause for this: the native library is loaded twice in the same classloader A. Loading the native library twice in the same classloader results in a corruption of (static) state.
It is not allowed to load the library twice and netty also tries to avoid that by loading the library in a static block of the OpenSsl class. But this static block is problematic because:

  • The io.netty.handler.ssl.OpenSsl class is part of netty-handler
  • The native library is loaded in the classloader of io.netty.internal.tcnative.SSLContext
  • The io.netty.internal.tcnative.SSLContext class is part of netty-tcnative-classes

=> This results in the application having the static state of OpenSsl in classloader A, and the extension/plugin having the static state of OpenSsl in classloader B. Both load the native library in classloader A. The second loading corrupts the tcnative state that is already present.

This issue might sound weird, and we are not saying that what we do is how it should be done (for example which classes are loaded from the parent classloader), but the state of the native library should in our opinion be handled in the classloader that loads and manages the native library (for example as static state in the io.netty.internal.tcnative.Library class).
Because netty and netty-tcnative are separated (with also different versioning), it feels weird that the state whether the tcnative library is already loaded is handled by a class in netty.

We are looking forward to your opinions on this.

@SgtSilvio do you have any suggestion how to fix this without breaking things ?

Good question.
At least some static state inside the tcnative classes would be required.
The code inside OpenSsl.loadTcNative (https://github.com/netty/netty/blob/f8977fd8dbda1de76bce886c0042785e453eabed/handler/src/main/java/io/netty/handler/ssl/OpenSsl.java#L710) would need to be guarded by this static state.
What makes this a bit complicated is that the tcnative classes do not know about the netty utils that are used for actually loading the native library.
A little bit hacky solution would be to add a static synchronized load method to io.netty.internal.tcnative.Library that takes a lambda that does the actual loading and is only called if a private static boolean loaded is still false.
To make newer netty versions compatible with older netty-tcnative versions that still lack this load method, the OpenSsl code could fallback to its old implementation.

This is only a vague idea and needs some proper thoughts because it feels a bit hacky to me.

@SgtSilvio this sounds like a possible way to solve it... Do you feel able to come up with a PR for this ?