apache / plc4x

PLC4X The Industrial IoT adapter

Home Page:https://plc4x.apache.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug]: plc4j-driver-opcua - Dead connection objects after session timeout

takraj opened this issue · comments

What happened?

OPC-UA connections get dead after session timeout, which is hard-coded to 2 minutes in the driver. After this timeout, any request made with this connection object fails with INTERNAL_ERROR. In case of subscriptions, an even worse ClassCastException gets thrown internally, complaining about being unable to cast a ServiceFault to a response object, and because of this, the future object earlier returned by request.execute() never get completed.

This is especially a problem when using plc4j-connection-pool, which keeps a set of connection objects open, and thus users of the pool might get these dead connections.

The session timeout value is hard-coded in SecureChannel.java:400

Simplest code to reproduce the issue:

PlcDriverManager driverManager = new PlcDriverManager();
try (PlcConnection opcuaConnection = driverManager.getConnection("opcua:tcp://opcuaserver.com:48010")) {
    PlcReadRequest request = opcuaConnection.readRequestBuilder()
            .addItem(
                    "Demo",
                    "ns=2;s=Demo.Static.Scalar.String"
            )
            .build();

    PlcReadResponse response1 = request.execute().get();
    System.out.println("RESPONSE 1: " + response1.getObject("Demo"));

    Thread.sleep(3 * 60 * 1000); // 3 minutes

    PlcReadResponse response2 = request.execute().get();
    System.out.println("RESPONSE 2: " + response2.getObject("Demo"));
}

Expected behavior would be to transparently recover from the error, and to create a new session internally.

Version

v0.10.0

Programming Languages

  • plc4j
  • plc4go
  • plc4c
  • plc4net

Protocols

  • AB-Ethernet
  • ADS /AMS
  • BACnet/IP
  • CANopen
  • DeltaV
  • DF1
  • EtherNet/IP
  • Firmata
  • KNXnet/IP
  • Modbus
  • OPC-UA
  • S7

Would a periodic keep-allive request help in this case? I know some projects usually do this.

This timeout is for when no messages are sent on channel, it is assumed that if you aren't using the channel then it should probably be closed and reopened when you need it. I will look for where in the OPCUA standard this is mentioned, if its not then we should be able to set something up to poll data before the timeout occurs but it kind of defeats the purpose of the timeout.

Using the latest snapshot, I was able to confirm that this timeout does close the channel if no messages are sent after 2 mins with the code you provided. I haven't had time to test it with subscriptions yet.

The timeout is hard coded to 2 mins, I'm happy for this to become configuration parameter as, yes some people may want to poll servers less often than once every 2 mins.

@takraj Very happy to see these issues being reported :)

Well I see the timeouts slightly different. They are needed in order for plcs to detect a client is gone and can therefore clear resources. Especially in cases where only a small number of simultaneous connections are allowed. As long as a client is connected and isn't disconnecting, I don't think we should "throw him out"... So I would be in strong favor of a keep-alive ping interaction.

I don't really have a preference, I'm happy for someone to implement another keep alive in OPCUA (Yes there is another keep alive doing a very similar thing), but we should probably tidy up the error handling for subscriptions and have this as a parameter.

Keep in mind that the timeout value is a negotiated value between the client and server, the server has the final say as to what this value will be.

There is a comment here, https://reference.opcfoundation.org/Core/Part4/v104/docs/5.6.2 about it, but as is common with OPCUA it doesn't go into too much details.

Also keep in mind that, yes generally PLC's won't allow too many connections, so this is a way for it to free up resources from clients that aren't using connections.