eclipse-ee4j / orb

Eclipse ORB is a CORBA orb for use in Jakarta EE and GlassFish and other projects that still need an ORB.

Home Page:https://projects.eclipse.org/projects/ee4j.orb

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Deadlock during ORB shutdown

okummer opened this issue · comments

I found the following two threads in a server with stuck JMX calls:

   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.glassfish.gmbal.impl.ManagedObjectManagerImpl.jmxRegistrationDebug(ManagedObjectManagerImpl.java:1225)
        - waiting to lock <0x00000000e3cf8858> (a org.glassfish.gmbal.impl.ManagedObjectManagerImpl)
        at org.glassfish.gmbal.impl.MBeanImpl.unregister(MBeanImpl.java:315)
        - locked <0x00000000e3df4ab8> (a org.glassfish.gmbal.impl.MBeanImpl)
        at org.glassfish.gmbal.impl.JMXRegistrationManager.unregister(JMXRegistrationManager.java:201)
        - locked <0x00000000e3f2e628> (a java.lang.Object)
        at org.glassfish.gmbal.impl.MBeanTree.unregister(MBeanTree.java:383)
        - locked <0x00000000e3cf88a0> (a org.glassfish.gmbal.impl.MBeanTree)
        at org.glassfish.gmbal.impl.MBeanTree.unregister(MBeanTree.java:378)
        - locked <0x00000000e3cf88a0> (a org.glassfish.gmbal.impl.MBeanTree)
        at org.glassfish.gmbal.impl.MBeanTree.clear(MBeanTree.java:419)
        - locked <0x00000000e3cf88a0> (a org.glassfish.gmbal.impl.MBeanTree)
        at org.glassfish.gmbal.impl.ManagedObjectManagerImpl.init(ManagedObjectManagerImpl.java:322)
        at org.glassfish.gmbal.impl.ManagedObjectManagerImpl.close(ManagedObjectManagerImpl.java:344)
        at com.sun.corba.ee.impl.orb.ORBImpl.destroy(ORBImpl.java:1516)
        at ...

   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.glassfish.gmbal.impl.MBeanTree.getMBeanImpl(MBeanTree.java:413)
        - waiting to lock <0x00000000e3cf88a0> (a org.glassfish.gmbal.impl.MBeanTree)
        at org.glassfish.gmbal.impl.ManagedObjectManagerImpl.getFacetAccessor(ManagedObjectManagerImpl.java:746)
        - locked <0x00000000e3cf8858> (a org.glassfish.gmbal.impl.ManagedObjectManagerImpl)
        at org.glassfish.gmbal.impl.TypeConverterImpl$3.toManagedEntity(TypeConverterImpl.java:435)
        at org.glassfish.gmbal.impl.TypeConverterImpl$TypeConverterListBase.toManagedEntity(TypeConverterImpl.java:900)
        at org.glassfish.gmbal.impl.AttributeDescriptor.get(AttributeDescriptor.java:110)
        at org.glassfish.gmbal.impl.TypeConverterImpl$3.toManagedEntity(TypeConverterImpl.java:436)
        at org.glassfish.gmbal.impl.AttributeDescriptor.get(AttributeDescriptor.java:110)
        at org.glassfish.gmbal.impl.MBeanSkeleton.getAttribute(MBeanSkeleton.java:526)
        at org.glassfish.gmbal.impl.MBeanSkeleton.getAttributes(MBeanSkeleton.java:572)
        at org.glassfish.gmbal.impl.MBeanImpl.getAttributes(MBeanImpl.java:362)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttributes(java.management@11.0.8/Unknown Source)
        at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttributes(java.management@11.0.8/Unknown Source)
        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(java.management.rmi@11.0.8/Unknown Source)
        at ...

The two threads are trying to obtains locks on MBeanTree and ManagedObjectManagerImpl in an inconsistent order, leading to a deadlock. This prevents the ORB (and hence the server) from shutting down.

Thanks for the report! Which server did this concern, and what version of it?

This report applies to Glassfish 4.2.2. The server is a custom Java application that uses CORBA for outgoing connections and that is monitored over JMX. The other end of the CORBA connection also runs on Glassfish 4.2.2.

I guess it's not easy to try to reproduce this on the current version of GlassFish?

The code for GlassFish 4.x wasn't transferred to Eclipse, and GlassFish 4.x is essentially unsupported.

This is probably a rare bug, which we observed once during thousands or tens of thousands of shutdowns. I have little hope that I can reproduce it under controlled conditions.

But as I looked into the code, I see that the affected classes actually stem from https://github.com/eclipse-ee4j/orb-gmbal and not from this exact repo. Should I recreate my issue there?

Over there, the code on the main branch and the line numbers have not changed since 4.0.0 (the release used by 4.2.2 of the ORB). There is still the pattern that a thread synchronized on ManagedObjectManagerImpl may want to synchronize on MBeanTree and that a thread synchronized on MBeanTree may want to synchronize on ManagedObjectManagerImpl.

In my specific case, the access on org.glassfish.gmbal.impl.ManagedObjectManagerImpl#jmxRegistrationDebugFlag in jmxRegistrationDebug() would not have to be synchronized. It would be sufficient to make the field jmxRegistrationDebugFlag volatile to enfore correct concurrency semantics. This would break the cycle.

There might be other cycles, but those that I could find immediately are harmless: org.glassfish.gmbal.impl.MBeanTree#setRoot calls org.glassfish.gmbal.impl.ManagedObjectManagerImpl#constructMBean, but only while it is already synchronized on ManagedObjectManagerImpl, so that's fine. MBeanImpl makes no other direct calls to ManagedObjectManagerImpl that I can find and while calls through the MBeanSkeleton might be problematic due to a reference back to the ManagedObjectManagerInternal, this reference is probably only used in the analyze phase and not when answering to the MBeanImpl.

Long story short: It might well be that removing the synchronization for jmxRegistrationDebug() actually breaks the loop.