dogtagpki / jss

Network Security Services for Java is a Java interface to NSS

Home Page:https://dogtagpki.github.io/jss

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NPE in Buffer.WriteCapacity()

edewata opened this issue · comments

Sometimes the Buffer.WriteCapacity() throws an NPE:

2021-07-02T18:29:32.7931657Z + pki -n caadmin ca-user-cert-add caagent --serial 0x7
2021-07-02T18:29:36.2129529Z java.lang.NullPointerException
2021-07-02T18:29:36.2131194Z 	at org.mozilla.jss.nss.Buffer.WriteCapacity(Native Method)
2021-07-02T18:29:36.2133683Z 	at org.mozilla.jss.ssl.javax.JSSEngineReferenceImpl.unwrap(JSSEngineReferenceImpl.java:1192)
2021-07-02T18:29:36.2136070Z 	at org.mozilla.jss.ssl.javax.JSSSocketChannel.read(JSSSocketChannel.java:276)
2021-07-02T18:29:36.2138079Z 	at java.base/java.nio.channels.SocketChannel.read(SocketChannel.java:486)
2021-07-02T18:29:36.2140097Z 	at org.mozilla.jss.ssl.javax.JSSSocketChannel.read(JSSSocketChannel.java:236)
2021-07-02T18:29:36.2142914Z 	at org.mozilla.jss.ssl.javax.JSSSocketChannel.implCloseSelectableChannel(JSSSocketChannel.java:381)
2021-07-02T18:29:36.2146867Z 	at java.base/java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel(AbstractSelectableChannel.java:242)
2021-07-02T18:29:36.2150320Z 	at java.base/java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:112)
2021-07-02T18:29:36.2155237Z 	at org.mozilla.jss.ssl.javax.JSSSocket.close(JSSSocket.java:710)
2021-07-02T18:29:36.2156889Z 	at org.apache.http.impl.BHttpConnectionBase.close(BHttpConnectionBase.java:332)
2021-07-02T18:29:36.2159757Z 	at org.apache.http.impl.conn.LoggingManagedHttpClientConnection.close(LoggingManagedHttpClientConnection.java:81)
2021-07-02T18:29:36.2162361Z 	at org.apache.http.impl.conn.CPoolEntry.closeConnection(CPoolEntry.java:70)
2021-07-02T18:29:36.2164432Z 	at org.apache.http.impl.conn.CPoolEntry.close(CPoolEntry.java:96)
2021-07-02T18:29:36.2166083Z 	at org.apache.http.pool.AbstractConnPool.shutdown(AbstractConnPool.java:152)
2021-07-02T18:29:36.2169476Z 	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.shutdown(PoolingHttpClientConnectionManager.java:411)
2021-07-02T18:29:36.2172247Z 	at org.apache.http.impl.client.HttpClientBuilder$2.close(HttpClientBuilder.java:1244)
2021-07-02T18:29:36.2174234Z 	at org.apache.http.impl.client.InternalHttpClient.close(InternalHttpClient.java:201)
2021-07-02T18:29:36.2176300Z 	at com.netscape.certsrv.client.SubsystemClient.exists(SubsystemClient.java:99)
2021-07-02T18:29:36.2178142Z 	at com.netscape.cmstools.cli.MainCLI.createCAClient(MainCLI.java:315)
2021-07-02T18:29:36.2179958Z 	at com.netscape.cmstools.user.UserCertAddCLI.execute(UserCertAddCLI.java:96)
2021-07-02T18:29:36.2181551Z 	at org.dogtagpki.cli.CommandCLI.execute(CommandCLI.java:58)
2021-07-02T18:29:36.2183062Z 	at org.dogtagpki.cli.CLI.execute(CLI.java:357)
2021-07-02T18:29:36.2183981Z 	at org.dogtagpki.cli.CLI.execute(CLI.java:357)
2021-07-02T18:29:36.2184925Z 	at org.dogtagpki.cli.CLI.execute(CLI.java:357)
2021-07-02T18:29:36.2186245Z 	at com.netscape.cmstools.cli.SubsystemCLI.execute(SubsystemCLI.java:79)
2021-07-02T18:29:36.2187543Z 	at org.dogtagpki.cli.CLI.execute(CLI.java:357)
2021-07-02T18:29:36.2189007Z 	at com.netscape.cmstools.cli.MainCLI.execute(MainCLI.java:665)
2021-07-02T18:29:36.2190349Z 	at com.netscape.cmstools.cli.MainCLI.main(MainCLI.java:703)

The issue was observed several times in PKI CI, but it does not happen very often. It's not clear whether the pki ca-user-cert-add command has anything to do with it.

Relevant code:

@cipherboy wrote:

... theoretically, SocketChannel can be accessed from parallel thingies (which is why it and Socket have multiple Synchronized statements in it) ...
... maybe we do need a few well-placed synchronized methods here and there, around the initialization and closing.
beginHandshake and cleanup probably need synchronized (this) ?
The other idea I have is that perhaps there is a double-close somewhere in the HTTP/... code -- if implCloseSelectableChannel() gets called twice, I wonder if it fails in interesting ways? Maybe inside the synchronized(this) block we need something for the engine == null case?

I would say that this happens now very frequently, and sometimes multiple times if you re-run the CI for PKI.

@ckelleyRH Hmmm I wonder if something has changed in the NSS layer. It might be worth talking to Bob and seeing if downgrading NSS versions helps (around F32's perhaps).

I created PR #787 as a temporary workaround to avoid CI failures and to unblock QE, and also to help isolate the problem. If the CI no longer breaks with this PR, we can be sure that it's caused by NSS changes.

On F33 it will use NSS 3.57, on F34 it will use NSS 3.63. They generate different PKCS11Constants, so the PKCS11Constants tests need to be disabled.

@rjrelyea We have a tool (written by @cipherboy) that parses the constants from pkcs11t.h and pkcs11n.h and put them into PKCS11Constants.java so they are accessible from Java applications. There are differences between NSS 3.57, 3.63, and 3.67, and we only store one version of PKCS11Constants.java in JSS source, so it's not possible to match it against different NSS versions.

@rjrelyea Just for some context, check out this old Pagure issue: https://pagure.io/jss/issue/26 --- the TL;DR is that JDK used to ship the Java equivalent of pkcs11*.h and we needed this for some (NSS) API calls. We ended up parsing the C header files (shipped by NSS) and generating our own equivalent of PKCS11Constants automatically. Every so often, as NSS updates, we update it as well.

It usually isn't a big deal, though we did have a CI test just to remind us to update it every so often. The CI test checks our parsed constants against NSS (generating a bunch of small C compilation units and comparing the values) and also comparing against JDK8 (when run under JKD8 --- JDK9+ removed them from the public package section). Usually they line up, but occasionally I've found bugs in either and reported them as appropriate. :-)

However, this all is certainly unrelated to the NPE at hand. The TLS interface doesn't make use of any PKCS#11 constants at all.


Edit: And since I realized I forgot to specify what I wanted your input on, @rjrelyea, it was regarding the close of a TLS connection: has anything changed at the NSS or NSPR levels around closing TLS connections recently? Perhaps a lower-level socket gets closed before a higher one reports as closed now, or visa-versa?