apache / plc4x

PLC4X The Industrial IoT adapter

Home Page:https://plc4x.apache.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug]: plc4j-driver-opcua seems to leak memory

takraj opened this issue · comments

What happened?

The OPC-UA driver seem to leak memory when closing a connection object. This issue sooner or later lead to application crash with OOM error. Workaround would be to reuse connection objects, but then we face the issue #1100, which I believe renders a fix for this pretty urgent.

Screenshot from 2023-09-23 14-14-10

Example code to reproduce the issue:

PlcDriverManager driverManager = new PlcDriverManager();

AtomicBoolean running = new AtomicBoolean(true);
Runtime.getRuntime().addShutdownHook(new Thread(() -> running.set(true)));

while (running.get()) {
    PlcConnection connection = driverManager.getConnection("opcua:tcp://127.0.0.1:40185/milo?discovery=false");
    try {
        Thread.sleep(100);
    } finally {
        connection.close();
    }
}

Some additional charts

Screenshot from 2023-09-23 14-23-14

Screenshot from 2023-09-23 14-08-05

Screenshot from 2023-09-23 14-10-16

Screenshot from 2023-09-23 14-10-25

Logs

logs-before-oom.txt

Heap dumps

Version

v0.10.0

Programming Languages

  • plc4j
  • plc4go
  • plc4c
  • plc4net

Protocols

  • AB-Ethernet
  • ADS /AMS
  • BACnet/IP
  • CANopen
  • DeltaV
  • DF1
  • EtherNet/IP
  • Firmata
  • KNXnet/IP
  • Modbus
  • OPC-UA
  • S7

I'll have a look at this as it seems none else is currently planning on doing so ... hopefully this doesn't require too much knowledge of the OPC-UA protocol.

Having seen now that you were using the pre 0.10.0 syntax for creating the connection, I think you have stumbled over a problem that we have already addresses in 0.10.0-SNAPSHOT ... we're planning on releasing that in the next few days. Would be cool, if you could have a look if this is fixed for your case too.

I am currently running an updated version of your program in JProfiler and am not seeing any increase in memory usage.

Ok ... after running it for almost two hours ... there's definitely something going on ... not sure if it's because I'm also running the server in the same VM ... It's consuming quite a bit more Memory and a lot more CPU time.
image

Ok ... just noticed I was running it in the IntelliJ profiler, not JProfiler ... I split up the application into a server (that just starts milo) and a client (with your code) ... when profilling only the client I can see a constant increase in sleeping threads ... I should probably figgure out why this is happening and fix it before the 0.11.0 release ...
image

And they all seem to be related to the nio event group:

image

Hmpf ... I thought we had addressed that issue and I guess it probably also has an effect on all other drivers too ... Usually opening and closing connections shouldn't be the default case, as the connection-cache should be used, but I think having somethign stuck will also be the reason why some applications simply don't stop gracefully.

image
Should have been this image

YAY ... I found it :-)
It wasn#t the event loops .. they are correctly closed.
It was the NettyHashTimerTimeoutManager that we start for every connection but never explicitly close it ... by overriding the close() method of ChannelDuplexHandler in Plc4xNettyWrapper and explicitly closing it, the leak of open threads is gone ... you can see a commons-pool growing at the start, but as soon as it's reached it's 11 threads the thread-count stays constant :-)

Please give the current 0.11.0-SNAPSHOT version a try and check if the problem is gone.

@chrisdutz I have tried the revision tagged with v0.11.0, but the problem does not seem to be resolved yet. :(

~/plc4x$ git log -1
commit d22042ef079ae93120770a95704571118d9c871e (HEAD, tag: v0.11.0)
Author: Christofer Dutz <cdutz@apache.org>
Date:   Mon Oct 2 09:51:59 2023 +0200

    [maven-release-plugin] prepare release v0.11.0

As you can see on the screenshot below, the heap usage is still constantly increasing:
image

Tested with the following code:

public class MemoryLeakDemo {

    public static void main(String[] args) throws Exception {
        PlcDriverManager driverManager = new DefaultPlcDriverManager();
        PlcConnectionManager connectionManager = driverManager.getConnectionManager();

        AtomicBoolean running = new AtomicBoolean(true);
        Runtime.getRuntime().addShutdownHook(new Thread(() -> running.set(true)));

        while (running.get()) {
            PlcConnection connection = connectionManager.getConnection("opcua:tcp://127.0.0.1:12686/milo?discovery=false");
            try {
                Thread.sleep(100);
            } finally {
                connection.close();
            }
        }
    }
}
commented

I think I have already handled the issue of thread pool leakage, but I'm not sure how to release the related resources.

V0.11.0

please help me @chrisdutz @takraj @hutcheb

image
image

dump file:

plc4x.zip

commented

What happened?

The OPC-UA driver seem to leak memory when closing a connection object. This issue sooner or later lead to application crash with OOM error. Workaround would be to reuse connection objects, but then we face the issue #1100, which I believe renders a fix for this pretty urgent.

Screenshot from 2023-09-23 14-14-10

Example code to reproduce the issue:

PlcDriverManager driverManager = new PlcDriverManager();

AtomicBoolean running = new AtomicBoolean(true);
Runtime.getRuntime().addShutdownHook(new Thread(() -> running.set(true)));

while (running.get()) {
    PlcConnection connection = driverManager.getConnection("opcua:tcp://127.0.0.1:40185/milo?discovery=false");
    try {
        Thread.sleep(100);
    } finally {
        connection.close();
    }
}

Some additional charts

Screenshot from 2023-09-23 14-23-14

Screenshot from 2023-09-23 14-08-05

Screenshot from 2023-09-23 14-10-16

Screenshot from 2023-09-23 14-10-25

Logs

logs-before-oom.txt

Heap dumps

Version

v0.10.0

Programming Languages

  • plc4j
  • plc4go
  • plc4c
  • plc4net

Protocols

  • AB-Ethernet
  • ADS /AMS
  • BACnet/IP
  • CANopen
  • DeltaV
  • DF1
  • EtherNet/IP
  • Firmata
  • KNXnet/IP
  • Modbus
  • OPC-UA
  • S7
    hi:
    It took me about three days, and I have fixed the thread and memory overflow issues in the V0.11.0 version. Can you create a new branch 0.11.1-SNAPSHOT for me to submit my code?
    The main issue is that the keepAlive in SecureChannel causes the channel to be occupied for a long time without being released.
    @chrisdutz @takraj @hutcheb
    image

I the world of plcs generally connections should be reused, as for most connections establishing a connection is pretty cost intensive. I know that opc-ua needs to resolve some things when connecting.

So I agree, that both issues should be addressed.

Just not sure when. If nobody else takes on the issue, generally opc-ua related issues are pretty low on my personal priority list 😉

Hi All,

I have the same issue, but when using Apache NiFi integration.

image

Should I create a new issue, or is this issue 1101 sufficient?

Could a priority be assigned to address this issue?

Thank you in advance!

I would strongly assume it's the same issue, as from my knowledge interenally all integration modules use the same components. So this issue should probably be observalble in all integration modules.

@christofe-lintermans-actemium I believe that #1007 might address some of these issues as it cleans up some of driver internals which could loose consistency over last two releases (0.10-0.11) and fall behind with stability. I am awaiting for feedback on field testing with real devices which should be completed this or next week after which I would like to merge these changes to develop. Then the Apache Nifi team will be able to produce new development build which should be free of (major) memory leaks.

Aaaah ... cool ... well in that case ... I guess as soon as these changes make it back to develop, sounds like a good time for releasing ;-) (I guess we've got enough stuff in there already)