Performance issues - cpu usage

Question

Performance issues - cpu usage

VitorKawao opened this issue a year ago · comments

Hello,
I am having some performance issues. The cpu usage is too high. I create a simple test that read 1500 tags

Sync read

     string prefix = "TAG_INT_";
     string fmt = "0000";
     Console.WriteLine($"Start: {DateTime.Now.ToString("dd/MM/yyyy hh:mm:ss.fff tt")}");
     for (int i = 1; i <= 1500; i++)
     {
         string tagName = prefix + i.ToString(fmt);
         var myTag = new Tag<DintPlcMapper, int>();
         myTag.Name = tagName;
         myTag.Gateway = "192.168.1.150";
         myTag.Path = null;
         myTag.PlcType = PlcType.Micro800;
         myTag.Protocol = Protocol.ab_eip;
         myTag.Timeout = TimeSpan.FromMilliseconds(500);
         myTag.AllowPacking = true;
         var result = myTag.Read();
         Console.WriteLine($"{tagName} = {result} {DateTime.Now.ToString("dd/MM/yyyy hh:mm:ss.fff tt")}");
     }

The cpu usage goes to 10% (my pc is a good one). And it tooks about 17s to read it all.

2 - Read Async
I tried to read async with AutoSyncReadInterval and ReadComplete.

        string prefix = "TAG_INT_";
        string fmt = "0000";
        Console.WriteLine($"Start: {DateTime.Now.ToString("dd/MM/yyyy hh:mm:ss.fff tt")}");
        for (int i = 1; i <= 1500; i++)
        {
            string tagName = prefix + i.ToString(fmt);
            var myTag = new Tag<DintPlcMapper, int>();
            myTag.Name = tagName;
            myTag.Gateway = "192.168.1.150";
            myTag.Path = null;
            myTag.PlcType = PlcType.Micro800;
            myTag.Protocol = Protocol.ab_eip;
            myTag.Timeout = TimeSpan.FromMilliseconds(500);
            myTag.AllowPacking = true;
            myTag.AutoSyncReadInterval = TimeSpan.FromMilliseconds(1000);
            myTag.ReadCompleted += MyTag_ReadCompleted;
            _ = myTag.ReadAsync();
        }

    private static void MyTag_ReadCompleted(object sender, TagEventArgs e)
    {
        if (sender is Tag<DintPlcMapper, int> myTag)
        {
                Console.WriteLine($"{myTag.Name} = {myTag.Value} {DateTime.Now.ToString("dd/MM/yyyy hh:mm:ss.fff tt")}");
        }
    }

It reads each tags each 3s, and the cpu usage goes to 15%¨
Is there any way to ensure that CPU usage does not grow too much?

Kyle Hayes · Answer 1 · Wed Jan 18 2023 07:29:34 GMT+0800 (China Standard Time)

How are you running this code in a loop?

The first example must be accidentally dropping the entire PLC connection from time to time to be that much slower. The Micro800 cannot packet multiple requests into a PDU so there should not be a lot of difference between the two examples. @jkoplo @timyhac ?

Note that a Micro800 is not a fast PLC at all. Reading 1500 tags as fast as you can will probably drive the CPU usage of the PLC very high. Make sure you measure the CPU load of the PLC! The libplctag DLL itself is capable of overwhelming a network connection to a ControlLogix L80 when running on a RaspberryPI.

Kyle Hayes · Answer 2 · Wed Jan 18 2023 07:33:37 GMT+0800 (China Standard Time)

One way the connection could be dropped is if the GC runs and closes the tag handle between iterations of the loop.

Kyle Hayes · Answer 3 · Wed Jan 18 2023 10:56:59 GMT+0800 (China Standard Time)

The example code shows that there are multiple things being tried. These two examples are not really equivalent. I suggest starting with the lowest level of code possible, the C DLL.

Download the latest C DLL release.
Unpack it and find the async_stress.exe program.
Run the program with two arguments on the command line where you unpacked it. The first argument is the number of tags to create. The second is the tag attribute string to use. As far as I can tell, your tag attribute string should be protocol=ab-eip&gateway=192.168.1.150&plc=Micro800&name=TAG_INT_1
Capture the output (there will be a LOT of output!!). On Windows you should be able to redirect the output to a log file. The program will run until you hit Ctrl-C!

The command line should look something like this:

C:\SOMEDIR> async_stress.exe 1500 "protocol=ab-eip&gateway=192.168.1.150&plc=Micro800&name=TAG_INT_1" > log.txt 2>&1

When I run this against my own PLC on my WiFi network, it fails because it takes more than 5000ms (the default timeout) to create all the tags and do the first read. My WiFi network has much higher latency than a wired network.

In the log file, you will find a line that looks like this: --- Ran 10 iterations with a total io time of 9048ms and min/avg/max of 13ms/904ms/1712ms.

This is what I get when testing with 100 tags.

I am using a ControlLogix L80 system but preventing it from using packed requests to simulate a Micro800. Again, my network is quite bad so you should see much smaller numbers.

Try this and attach the results here. From the numbers I see, I am not sure you can get the total time much lower. The async_stress.exe program will run until it gets an error or you type Ctrl-C. If you let it run, you can see what the CPU load looks like and we can determine whether the load is due to the C DLL or the .Net wrapper.

VitorKawao · Answer 4 · Wed Jan 18 2023 21:17:22 GMT+0800 (China Standard Time)

Thank you for your answer.
The problem happened at a client that uses ControlLogix, the cpu grows to 70%.
I tested at micro800 because it is the plc that I have here.

Later I'll test in c, to check if the cpu problem occurs in c or it is only for c#.

timyhac · Answer 5 · Thu Jan 19 2023 05:58:43 GMT+0800 (China Standard Time)

The first example must be accidentally dropping the entire PLC connection from time to time to be that much slower. The Micro800 cannot packet multiple requests into a PDU so there should not be a lot of difference between the two examples. @jkoplo @timyhac ?

It doesn't make sense in the second example to use both ReadAsync and AutoSyncReadInterval. Also, ReadAsync isn't awaited.

What kyle said about tags being Garbage Collected is correct. Tags are designed to be long-lived. Store them in a local variable to prevent them being garbage collected.

For what its worth - when my PC is idling, it sits at about 12% CPU usage. Everything is relative when talking about performance, so it would be helpful if you can provide some more context about what the problem is.

Here is an example C# program that should help to analyze the .NET side of things more precisely:

It separates Initialization from Reads.
It does not print anything to console (which can itself use a non-trivial amount of CPU when printing a lot).
It won't invoke garbage collection.

internal class Program
{
    static void Main(string[] args)
    {
        var test = new LoadTest();

        // Don't measure here
        test.Setup();

        // Measure here
        while(true)
        {
            test.Test();
        }
    }
}

class LoadTest
{
    List<Tag<DintPlcMapper, int>> tags = new();

    public void Setup()
    {
        for (int i = 1; i <= 1500; i++)
        {
            var myTag = new Tag<DintPlcMapper, int>
            {
                Name = $"TAG_INT_{i:0000}",
                Gateway = "192.168.1.150",
                Path = null,
                PlcType = PlcType.Micro800,
                Protocol = Protocol.ab_eip,
                Timeout = TimeSpan.FromMilliseconds(500),
                AllowPacking = true
            };
            myTag.Initialize();
            tags.Add(myTag);
        }
    }

    public void Test()
    {
        foreach (var tag in tags)
        {
            tag.Read();
        }
    }
}

jkoplo · Answer 6 · Thu Jan 19 2023 09:55:39 GMT+0800 (China Standard Time)

@timyhac in your example code it won't do any tag packing since it's reading each tag synchronously. So you'll get one tag read per round trip - the throughput will be pretty low. But if we're just looking at cpu usage overhead maybe that's intended.

I really need to carve out some time to finish performance benchmarking and improvements.

That said, using the techniques in that performance issue, one of my coworkers benchmarked libplctag.net against Ingear and we were faster in pretty much all cases. This was real-world against a clogix PLC.

So we got that going for us, which is nice.

Kyle Hayes · Answer 7 · Thu Jan 19 2023 23:31:33 GMT+0800 (China Standard Time)

The Micro800 does not support request packing anyway.

When I test this on my network faking a Micro800, it takes more than 5 seconds to just create the 1500 tags. But my network is not good.

Kyle Hayes · Answer 8 · Fri Jan 20 2023 23:38:51 GMT+0800 (China Standard Time)

Any update on timing tests?

VitorKawao · Answer 9 · Sat Jan 21 2023 00:30:09 GMT+0800 (China Standard Time)

I tested using the timyhac example, initializing and adding to a list, then reading.
It is still with performance issues. It takes about 8s to initializing then 9s to read each 1500 tags,
And the cpu usage of the exe grows to 13%.

My pc is Intel(R) Core(TM) i7-3770 CPU @3.40GHz 3.40 GHz
I tested in another pc (Intel(R) Core(TM) i5-3330 CPU @3.00GHz 3.00 GHz) and the cpu usage grows to 30%
.
I haven't tested using c yet.

Kyle Hayes · Answer 10 · Sat Jan 21 2023 11:24:55 GMT+0800 (China Standard Time)

Hi Vitor,

Let me restate the results so that I make certain that I understand them.

When you test with Tim's program, it takes 8 seconds to create the tags and then takes 9 seconds to read all 1500 tags. Is that correct?

What is the network latency from your PC to the Micro800 PLC? You can use the ping.exe tool on the command line to get this information.

I modified the async_stress.c program to do essentially what Tim's .Net version does above. Here are some results simulating a Micro800 (running with the PLC type as Micro800).

First I tested my network:

ping 10.206.1.40
PING 10.206.1.40 (10.206.1.40): 56 data bytes
64 bytes from 10.206.1.40: icmp_seq=0 ttl=64 time=13.431 ms
64 bytes from 10.206.1.40: icmp_seq=1 ttl=64 time=6.810 ms
64 bytes from 10.206.1.40: icmp_seq=2 ttl=64 time=8.369 ms
64 bytes from 10.206.1.40: icmp_seq=3 ttl=64 time=6.937 ms
64 bytes from 10.206.1.40: icmp_seq=4 ttl=64 time=15.009 ms
64 bytes from 10.206.1.40: icmp_seq=5 ttl=64 time=14.699 ms
64 bytes from 10.206.1.40: icmp_seq=6 ttl=64 time=7.919 ms
64 bytes from 10.206.1.40: icmp_seq=7 ttl=64 time=14.267 ms
64 bytes from 10.206.1.40: icmp_seq=8 ttl=64 time=12.136 ms
64 bytes from 10.206.1.40: icmp_seq=9 ttl=64 time=15.808 ms
64 bytes from 10.206.1.40: icmp_seq=10 ttl=64 time=14.602 ms
64 bytes from 10.206.1.40: icmp_seq=11 ttl=64 time=16.535 ms
64 bytes from 10.206.1.40: icmp_seq=12 ttl=64 time=15.528 ms
64 bytes from 10.206.1.40: icmp_seq=13 ttl=64 time=6.941 ms
64 bytes from 10.206.1.40: icmp_seq=14 ttl=64 time=15.566 ms
^C
--- 10.206.1.40 ping statistics ---
15 packets transmitted, 15 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 6.810/12.304/16.535/3.625 ms

As you can see, this is a bad network. The WiFi latency varies from 6ms to over 16ms. With one packet round trip taking 16ms, the fastest the C library can read tags is 60 tags per second if the PLC introduces no additional latency.

Here is a test of 100 tags:

Hit ^C to terminate the test.
Creation of 100 tags took 743ms.
Read of 100 tags took 639ms.
Read of 100 tags took 647ms.
Read of 100 tags took 555ms.
Read of 100 tags took 695ms.
Read of 100 tags took 555ms.
Read of 100 tags took 523ms.
Read of 100 tags took 534ms.
Read of 100 tags took 514ms.
Read of 100 tags took 527ms.
Read of 100 tags took 537ms.
Read of 100 tags took 540ms.
Read of 100 tags took 540ms.
Read of 100 tags took 523ms.
Read of 100 tags took 637ms.
Read of 100 tags took 639ms.
Read of 100 tags took 654ms.
Read of 100 tags took 637ms.
Read of 100 tags took 497ms.
Program terminated!
--- Ran 19 iterations with a total io time of 10393ms and min/avg/max of 497ms/547ms/695ms.

So in this case, it looks like takes an average of about 5.47ms per tag read (547ms/100 tags).

Now I'll try 1000 tags:

Hit ^C to terminate the test.
Creation of 1000 tags took 7005ms.
Read of 1000 tags took 7118ms.
Read of 1000 tags took 5184ms.
Read of 1000 tags took 5223ms.
Read of 1000 tags took 6112ms.
Read of 1000 tags took 8075ms.
Read of 1000 tags took 6733ms.
Read of 1000 tags took 5298ms.
Read of 1000 tags took 5382ms.
Read of 1000 tags took 7997ms.
Read of 1000 tags took 5249ms.
Read of 1000 tags took 5686ms.
Read of 1000 tags took 5623ms.
Read of 1000 tags took 5928ms.
Read of 1000 tags took 7913ms.
Read of 1000 tags took 5374ms.
Read of 1000 tags took 5201ms.
Read of 1000 tags took 5376ms.
Read of 1000 tags took 5811ms.
Program terminated!
--- Ran 19 iterations with a total io time of 109283ms and min/avg/max of 5184ms/5751ms/8075ms.

Here we can see that it takes 5.7ms (5751ms/1000 tags) per tag read. About the same. But it still takes 5.7 seconds to read all 1000 tags. There were some timeouts that appear to be due to transient latency (my network sometimes hangs). Since there is little change between the 100 tag case and the 1000 tag case in time per tag read, we can simply multiple to find that 1500 tags would take about 8.6 seconds to read. That is not that much different that what Vitor measured.

The system load on my Mac Air M1 was between 40 and 60% of a CPU. The reason I say a CPU is that this laptop has 4 performance cores and 4 efficiency cores. As far as I can tell the program is running on one of the efficiency cores which is much lower performance. The system as a whole was about 85% idle (this fluctuated).

Here's the same test, but run against a ControlLogix:

Hit ^C to terminate the test.
Creation of 1000 tags took 6837ms.
Read of 1000 tags took 97ms.
Read of 1000 tags took 946ms.
Read of 1000 tags took 103ms.
Read of 1000 tags took 88ms.
Read of 1000 tags took 89ms.
Read of 1000 tags took 90ms.
Read of 1000 tags took 92ms.
Read of 1000 tags took 87ms.
Read of 1000 tags took 85ms.
Read of 1000 tags took 83ms.
Read of 1000 tags took 90ms.
Read of 1000 tags took 88ms.

....

Read of 1000 tags took 84ms.
Read of 1000 tags took 86ms.
Read of 1000 tags took 111ms.
Read of 1000 tags took 104ms.
Read of 1000 tags took 93ms.
Read of 1000 tags took 99ms.
Read of 1000 tags took 90ms.
Read of 1000 tags took 104ms.
Read of 1000 tags took 104ms.
Read of 1000 tags took 108ms.
Read of 1000 tags took 95ms.
Read of 1000 tags took 99ms.
Read of 1000 tags took 96ms.
Read of 1000 tags took 97ms.
Read of 1000 tags took 100ms.
Read of 1000 tags took 97ms.
Read of 1000 tags took 93ms.
Read of 1000 tags took 74ms.
Program terminated!
--- Ran 497 iterations with a total io time of 70779ms and min/avg/max of 74ms/142ms/7286ms.

Notice the much higher variance. 74ms to 7286ms. That is how bad my WiFi network is. But the average is interesting. On average it takes 142ms to read 1000 tags. That is about 40x faster due to the ability to pack multiple requests per network packet to the PLC when targeting a ControlLogix.

Total load on the system was lower, around 90% idle with a single efficiency core at 30-50%.

My results show that while there is CPU load, it isn't that much and the overall throughput to a ControlLogix is high even with a bad network.

There are a lot of variables here. I am using macOS and an ARM CPU instead of Windows and x86-64.

This is not with the version of async_stress.c that is included in the normal release, but one that I changed to be about the same logic as Tim's version.

VitorKawao · Answer 11 · Tue Jan 24 2023 01:23:44 GMT+0800 (China Standard Time)

Hello Kyle,

About your questions

When you test with Tim's program, it takes 8 seconds to create the tags and then takes 9 seconds to read all 1500 tags. Is that correct?
This is a problem, but the biggest problem is the cpu usage. At my client's pc it grows up to 70%, so the pc freezes.

What is the network latency from your PC to the Micro800 PLC?
The communincation is good

The problem with my client is with ControlLogix. I am testing with Micro800 because it is the plc that I have here.
I'll try to test there too, to check if the ControlLogix speed is too faster than Micro800, like your example

To use the AllowPacking at ControlLogix it is only set the Tag attribute or I need to do something else? It works for all kind of tags or only for vectors?

fabio160781 · Answer 12 · Tue Jan 24 2023 02:35:59 GMT+0800 (China Standard Time)

Hello Kyle,

About your questions

When you test with Tim's program, it takes 8 seconds to create the tags and then takes 9 seconds to read all 1500 tags. Is that correct? This is a problem, but the biggest problem is the cpu usage. At my client's pc it grows up to 70%, so the pc freezes.

What is the network latency from your PC to the Micro800 PLC? The communincation is good

The problem with my client is with ControlLogix. I am testing with Micro800 because it is the plc that I have here. I'll try to test there too, to check if the ControlLogix speed is too faster than Micro800, like your example

To use the AllowPacking at ControlLogix it is only set the Tag attribute or I need to do something else? It works for all kind of tags or only for vectors?

Hi Kyle, we have spoken before by email. I do appreciate all the messages you, Tim and jkoplo have sent so far. It is great to be able to interact and learn with you and not only a simple interaction, but you guys are really fast to respond and also so proactive to run local tests too.

Regarding Vitor's answer, I would like to add some points to his answer and put in words some doubts we still have. Basically, the Micro800 test, we had very similar results. But when communicating with Control Logix, our results are still far from the performance you got on your last tests.
So it is important to understand the packing feature because, depending on our test case (scenario), although we set AllowPacking to true, maybe we are not making use of it.
In our customer's communication we read different types of tag(inside these 1500 tags) It is not ordered (we might have a boolean, than some integers or reals and then again more booleans). And the tag's name vary a lot (it is not organized as in our test case - "TAG_INT_" ). Does the packing should work even in that scenario ? Does the "myTag.AllowPacking = true;" is the only thing that needs to be set?

We will try to schedulle a remote access to the customer's computer to run some tests there again. Check cpu usage, check the processor and cores and clock speed. I know it is a new Dell computer but I don't know what's the architecture and processor. As you mentioned, in your computer 1 core got close to 50%, but you still have 7 cores. So the overall cpu usage is very low. While accessing the computer, we will try to run another test: Read from Vitor's program from TAG_INT_1 up to TAG_INT_1500, always reading an INT for all the tags. If we get different results, we probably need to learn better the AllowPacking (asked above), if the results are similar, it probably means the client machine doesn't have enough hardware.
I might be wrong about some statements I made above, please feel free to correct me.

I tried to summarize our thoughts and doubts, sorry for the long message. And thanks again.
Att,
Fabio Carvalho

jkoplo · Answer 13 · Tue Jan 24 2023 13:55:10 GMT+0800 (China Standard Time)

This whole issue (#224) has a bunch of performance info, but I started benchmarking here. The code I recommend for reading a bunch of tags is below (really just the one line). I didn't do anything special when instantiating tags - it's about as close to the basic example for DINTs as you can get on CLogix.

      internal static async Task ReadWhenAllAsync(List<TagDint> allTags)
        {
            var stopwatch = new Stopwatch();

            stopwatch.Restart();
            await Task.WhenAll(allTags.Select(x => x.ReadAsync()));
            stopwatch.Stop();
            Console.WriteLine($"Tag ReadWhenAllAsync = {stopwatch.ElapsedMilliseconds} msec");
        }

I've also done some longterm testing with this code - running a read loop that was pulling like 100 tags every second over a 24 hour period or something like that. I didn't see any memory leaks or runaway CPU usage.

I have some work in a branch that's looking to really dig in and optimize how we manage the async reads to the base library. Unfortunately I haven't had time to finish it up and prove out some hunches. My initial thought is that there should be room to reduce our overhead in terms of memory (and probably CPU).

jkoplo · Answer 14 · Tue Jan 24 2023 13:58:16 GMT+0800 (China Standard Time)

Oh, it's all worth noting that @kyle-github has a very basic mock program that pretends to be a PLC for testing. I don't think it supports all tag types, but it would probably be preferable to test against that instead of a Micro800 to help isolate your code performance.

VitorKawao · Answer 15 · Wed Jan 25 2023 04:09:10 GMT+0800 (China Standard Time)

Hello,

I run a simple project at my client machine. The plc is a ControlLogix and the communication betwen the machine and the plc is good

I read the tags name from a file, and there is int, real and bool (there is lot of booleans).
I had much worse performance than Kyle

First test was with 500 tags

The cpu usage grows to 26%

Second Test with 1000 tags

The cpu usage grows to 26%

Third Test with 1500 tags

The cpu usage grows to 26%

It seems that bool tag is slower than numeric tags. And plc has more booleans.

The test is with Read(), I tried to use ReadAsync(), it too much faster at beginning but it take a long time to the last tag arrives at ReadComplete handler

fabio160781 · Answer 16 · Wed Jan 25 2023 05:11:55 GMT+0800 (China Standard Time)

Just want to add some information to Vitor's tests today.

As he mentioned, the ReadAsync function is very fast, but when we use the ReadCompleted to actually get the value (with await), for the 1500 tags we will wait for 26 seconds until we get the last tag read response.

I have also read an interesting documentation from Software Toolbox giving some tips to improve the control logix communications. Of course, related to their software, but the hints are interesting.

https://support.softwaretoolbox.com/ci/fattach/get/20774/1245932540/redirect/1/filename/Optimizing_ControlLogix_Communications.pdf

According to the documentation above:

Steps to Maximum Throughput
1) Use Global Tags only
2) Use Array Tags
3) Keep PLC Tag Names Short
4) Set PLC CPU time slice to 40%-50%.
5) Define Alias Tags with RSLogix 

The use of short tag names in the PLC for tags other than arrays is also of great benefit. 
This is because the TOP Server packs the PLC tag addresses into the Multi-Item Request packet sent to the ControlLogix. 
The 500-byte limit is what makes the shortness of the tag addresses so critical. Creating all required PC communications tags under the Global file is one way of shortening the names because Global tags require the least amount of space in the MultiItem Request Packet. 
Local (i.e. Program) tags may seem nice, but in the ControlLogix to get at a program tag (vs. a Global) we also have to put in the packet the text “Program:ProgramName.” plus the tag name! – You can see how 500 bytes can go fast.

I am not sure how relevant this is when we talk about libplctag, but since our tests have different tag names (example below):
PROGRAM:MAINPROGRAM.HLL_FT1012.LOWLIMIT
PROGRAM:MAINPROGRAM.HLL_FT1033.HIGHLIMIT
PROGRAM:MAINPROGRAM.HLL_FT1033.LOWLIMIT
PROGRAM:MAINPROGRAM.HLL_FT1034.HIGHLIMIT
PROGRAM:MAINPROGRAM.HLL_FT1034.LOWLIMIT
PROGRAM:MAINPROGRAM.HLL_FT1037.HIGHLIMIT
PROGRAM:MAINPROGRAM.TIC_1010_BLOCO.PGAIN
PROGRAM:MAINPROGRAM.TIC_1010_BLOCO.IGAIN
PROGRAM:MAINPROGRAM.TIC_1010_BLOCO.DGAIN

This "big tag names" might be impacting our capability to use the packing feature. Does it make sense? I am trying to understand why our tests have such bad performance agains Kyle's tests. Kyle, are you testing read or read async? From my understanding you are readind sync.

Kyle Hayes · Answer 17 · Wed Jan 25 2023 07:14:52 GMT+0800 (China Standard Time)

Hi, I am traveling so I don't have access to my personal laptop. Smaller tag names will help. The tag name is part of the request. The longer the name, the fewer requests can be packed together. If the ControlLogix has a newer type of network module or has a L80 or newer CPU module with the built-in network port, the libplctag library will negotiate a 4000-byte packet size. With a ControlLogix or CompactLogix the library will use the multi-request CIP command to pack as many requests as possible into one packet to the PLC. This packing is transparent to the application but you need to help the library by queuing the requests at the same time. The async_stress code does that as do Tim and Jody's examples. The library will pack requests while it is waiting for a response from the PLC for the previous request. In my testing, the data type of a tag does not matter much. If your tag has a lot of data it will use up more space in the response. You can read and entire UDT tag at once but you need to decide the data using your own method. If most of your tags are individual field in a UDT, it may make sense to do that. Micro800 PLCs cannot use packed requests. Best, Kyle

…

On Tue, Jan 24, 2023, 4:12 PM fabio160781 ***@***.***> wrote: Just want to add some information to Vitor's tests today. As he mentioned, the ReadAsync function is very fast, but when we use the ReadCompleted to actually get the value (with await), for the 1500 tags we will wait for 26 seconds until we get the last tag read response. I have also read an interesting documentation from Software Toolbox giving some tips to improve the control logix communications. Of course, related to their software, but the hints are interesting. https://support.softwaretoolbox.com/ci/fattach/get/20774/1245932540/redirect/1/filename/Optimizing_ControlLogix_Communications.pdf <http://url> According to the documentation above: Steps to Maximum Throughput 1) Use Global Tags only 2) Use Array Tags 3) Keep PLC Tag Names Short 4) Set PLC CPU time slice to 40%-50%. 5) Define Alias Tags with RSLogix The use of short tag names in the PLC for tags other than arrays is also of great benefit. This is because the TOP Server packs the PLC tag addresses into the Multi-Item Request packet sent to the ControlLogix. The 500-byte limit is what makes the shortness of the tag addresses so critical. Creating all required PC communications tags under the Global file is one way of shortening the names because Global tags require the least amount of space in the MultiItem Request Packet. Local (i.e. Program) tags may seem nice, but in the ControlLogix to get at a program tag (vs. a Global) we also have to put in the packet the text “Program:ProgramName.” plus the tag name! – You can see how 500 bytes can go fast. I am not sure how relevant this is when we talk about libplctag, but since our tests have different tag names (example below): PROGRAM:MAINPROGRAM.HLL_FT1012.LOWLIMIT PROGRAM:MAINPROGRAM.HLL_FT1033.HIGHLIMIT PROGRAM:MAINPROGRAM.HLL_FT1033.LOWLIMIT PROGRAM:MAINPROGRAM.HLL_FT1034.HIGHLIMIT PROGRAM:MAINPROGRAM.HLL_FT1034.LOWLIMIT PROGRAM:MAINPROGRAM.HLL_FT1037.HIGHLIMIT PROGRAM:MAINPROGRAM.TIC_1010_BLOCO.PGAIN PROGRAM:MAINPROGRAM.TIC_1010_BLOCO.IGAIN PROGRAM:MAINPROGRAM.TIC_1010_BLOCO.DGAIN This "big tag names" might be impacting our capability to use the packing feature. Does it make sense? I am trying to understand why our tests have such bad performance agains Kyle's tests. Kyle, are you testing read or read async? From my understanding you are readind sync. — Reply to this email directly, view it on GitHub <#324 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAN4LCY4LYSE3JVFLD7T2YTWUBASLANCNFSM6AAAAAAT54AR4Y> . You are receiving this because you were mentioned.Message ID: ***@***.***>

timyhac · Answer 18 · Thu Jan 26 2023 05:13:12 GMT+0800 (China Standard Time)

Even if CPU usage or read latency could be reduced, what is the reason that the current CPU usage/Read latency is too high? i.e. what problem are you trying to solve?

Vitor's message indicates that the problem is that the "PC freezes", and that high CPU usage is a contributing factor - can you elaborate on this?

the biggest problem is the cpu usage. At my client's pc it grows up to 70%, so the pc freezes.

Some other notes:

Enable libplctag debug logs to assist in your investigation. Example here.
"The CPU usage grows to 26%" - this sentence is ambiguous; what is it growing from, and how are you measuring these numbers? Ideally, the measurement would be done in code so that others can easily execute the test, and you can be more precise in measuring specific code paths. But if you're using a tool, which tool, and how are you using it?
async methods should usually be awaited. Triggering a read with ReadAsync and listening for ReadCompleted would be an uncommon and advanced use case. Where are exceptions being caught?
Did something change? - the CPU usage number was previously 70% and yesterday it was 26%. It would be useful to post the code (and any other relevant conditions) used to produce any results.
AllowPacking is enabled by default and only used for devices that support it - it is exposed so that you can disable it. To reduce the number of differences between the customer environment (with the ControlLogix) and your development envrionment (with the Micro800) you could disable it.

PWSys · Answer 19 · Thu Jan 26 2023 21:19:06 GMT+0800 (China Standard Time)

To add further context, it would also be helpful to know the following:

What model ControlLogix is being used? There are major differences in the CPUs and how communications are prioritized between 1756-L6X, 1756-L7X, and 1756-LX8 series PLCs.
If it is not an 8 series PLC, what is System Overhead Time Slice?
What firmware revision is being used?
What is handling communications? The embedded Ethernet port (if 8 series), a 1756-ENBT, or a 1756-EN2T?
Are any gateways in use? Are communications running directly from the PC to the Ethernet card with the PLC in the same chassis? I didn't see any code examples for communications with the ControlLogix PLC, where this is defined.
What is the CPU usage in Logix Task Monitor?
What is the CPU usage of the network card? You can see this by accessing the web page for the Ethernet card while reading tags while performing the test.

If possible, it would be helpful to benchmark this with another driver. Can you perform the same test with FactoryTalk Linx Enterprise and the FactoryTalk Live Data Test Client?

VitorKawao · Answer 20 · Fri Jan 27 2023 01:47:28 GMT+0800 (China Standard Time)

"The CPU usage grows to 26%" - this sentence is ambiguous; what is it growing from, and how are you measuring these numbers? Ideally, the measurement would be done in code so that others can easily execute the test, and you can be more precise in measuring specific code paths. But if you're using a tool, which tool, and how are you using it?

I am using TaskManager to monitor it.
When I start the application, the cpu usage of the process is 0, and when I start reading it grows.

Did something change? - the CPU usage number was #324 (comment) and #324 (comment). It would be useful to post the code (and any other relevant conditions) used to produce any results.

The difference between the 70% and 26% is that 70% is our real product and the 26% is a basic test.
Our product allows the user to create many threads to read from plc. Each thread has its own client that reads from time to time, there is thread that reads every second, and there is thread that reads from each 5 seconds. They each have a small quantity of tags. Summing all the tags of the project we have about 1500 tags. And it is at 1.0.13 version.
The simple test is a test that reads all the tags. And it is at 1.1.1 version

About PWSys question, I'll need to check with my client all this information

Kyle Hayes · Answer 21 · Wed Feb 08 2023 23:45:07 GMT+0800 (China Standard Time)

I have been out for a while. Is there any status update on this?

VitorKawao · Answer 22 · Wed Feb 08 2023 23:50:59 GMT+0800 (China Standard Time)

Any updates, my application still executing with a high cpu usage, and not too fast.

I get this info about the plc, I do not know if it helps
Type: 1769-L36ERM CompactLogix 5370 Controller
Revision: 31.011

Kyle Hayes · Answer 23 · Thu Feb 09 2023 13:33:54 GMT+0800 (China Standard Time)

What is the network module?

VitorKawao · Answer 24 · Thu Feb 09 2023 23:46:12 GMT+0800 (China Standard Time)

The owner of the CompactLogix send me this picture with the its information

And It has an ethernet embedded and there is no adicional module.

Kyle Hayes · Answer 25 · Fri Feb 10 2023 00:21:01 GMT+0800 (China Standard Time)

Thanks, Vitor.

Based on the docs, it looks like this is a standard CompactLogix port. I think that these are only 100Mbps, but I cannot find a speed listed in the docs. CompactLogix use the internal CPU for both network and for running the PLC control code. This means that you need to be very, very careful how much load you place on the PLC.

If you got to the web page the network stack shows on port 80, you can navigate to the page that shows the load. On my ControlLogix L80, this is http://10.206.1.40:80/ -> Diagnostics -> Module Diagnostics. You will need to change the IP address to the one for the PLC. This page changes depending on the firmware, so you may have to explore a bit. It should show the current and maximum packet rates.

Note that the docs show that you can only have 120 TCP connections to this type of PLC. That is not very many :-(

I have never seen a CompactLogix that supported the large packet size (4002 bytes). That is not to say that they do not exist, but I have not seen one. If this PLC only supports the smaller packet size, then you will not be able to pack as many requests into one packet to the PLC. That will slow down the reads and increase the CPU load on the PC.

@jkoplo @timyhac , do you have any measurements of the overhead of using callbacks? In this case, I wonder if setting up the tags to use auto sync reads would be a better idea? That would eliminate all the cross DLL marshalling/unmarshalling to trigger reads and only have that hit for the call back.

Vitor, the idea to test here (if Jody and Tim think it might help) is to set up your tags with automatic reads using the auto sync configuration. Create your tags with this attribute set and with a callback that looks for read complete events.

Vitor, is there any way you can get the same PLC type? It is important to make sure our tests are really testing the same configuration as in your customer's factory.

jkoplo · Answer 26 · Fri Feb 10 2023 00:36:05 GMT+0800 (China Standard Time)

I don't have much in terms of benchmarks on callbacks but I figure that dotnet must use them for system calls on Windows/Linux for any low level resource interaction.

I'd be shocked if they were part of any sort of bottle neck for us. But that's just my intuition.

PWSys · Answer 27 · Fri Feb 10 2023 03:27:21 GMT+0800 (China Standard Time)

These CompactLogix PLCs do support Large Connections when running firmware 20.11 and up. They are closely related to L7 ControlLogix PLCs.
https://rockwellautomation.custhelp.com/app/answers/answer_view/a_id/1076270

Going to the webpage for the controller would be beneficial to know the embedded Ethernet card's utilization. What else is communicating with this PLC? A common scenario is using numerous PanelView terminals which consume multiple TCP connections. If this PLC is communicating with Ethernet/IP I/O devices that will consume additional resources. The diagnostics web page will indicate the percentage of utilization as well as the number of TCP and CIP connections in use and available.

What is the System Overhead Time Slice? The default is 20%. Can you try increasing this value? I've experienced unpredictable results when communicating with controllers that are overly taxed. Examples include running continuous tasks with a low System Overhead Time Slice. This effectively prioritizes executing logic over communications.

Kyle Hayes · Answer 28 · Fri Feb 10 2023 03:59:48 GMT+0800 (China Standard Time)

Ah, thanks for the update about the CompactLogix! I don't have one and have not used one for years. That is good news for the possible performance.

There is a fairly large gap in performance between the L7x series CPUs and the L8x series. I have L55, L61 and L81 but not a L7x in my test lab but I used to have a L71. The L81 can handle a heavy network load. That is in a ControlLogix chassis.

Every connected device uses at least one TCP connection (to some extent, even produced/consumed tags and remote IO). So 120 connections is not that many. Every laptop with RSLinx, other OPC server etc. uses at least one TCP connection.

Back to the performance... I am not a C#/.Net programmer, so I cannot come up with sample code. I could try to write up something in Java as that would be close to C# but things like callbacks might be quite different between Java and .Net. The rough outline would be this:

1. Create each tag with auto_sync_read_ms=1000 and a callback.   This will set up a read every second.   
2. The callback should log or store the value of the tag when called.   Note that this will be from another thread, so you will need to protect any shared data.
3. Create 1500 tag handles.
4. Watch PC CPU load.

Note that there is no awaiting at all in this model. The underlying C DLL will call you when the tag read is complete.

VitorKawao · Answer 29 · Fri Feb 10 2023 20:32:50 GMT+0800 (China Standard Time)

Vitor, the idea to test here (if Jody and Tim think it might help) is to set up your tags with automatic reads using the auto sync configuration. Create your tags with this attribute set and with a callback that looks for read complete events.

Vitor, is there any way you can get the same PLC type? It is important to make sure our tests are really testing the same configuration as in your customer's factory.

I think I can test it on my client's PC, I need to talk to him first but he will probably allow me to test there. Maybe next week I can test there with auto sync configuration

What is the System Overhead Time Slice? The default is 20%. Can you try increasing this value? I've experienced unpredictable results when communicating with controllers that are overly taxed. Examples include running continuous tasks with a low System Overhead Time Slice. This effectively prioritizes executing logic over communications.

I'll try to check this property and try to increase it

Thanks for your help

VitorKawao · Answer 30 · Wed Feb 15 2023 01:56:15 GMT+0800 (China Standard Time)

I finally tested the async read at my client pc.

What is the System Overhead Time Slice?

It is at 45%.

I tested with a simple project, with 1538 tags.
The cpu usage is almost the same, about 25-27%.

But, it reads the tags every 1 second correctly, it has a great difference between reading it all at once.
It takes some time to initialize all the tags, about 7-8 seconds.

A question, does the auto sync work for all plcs except micro800?
I tested with micro800, but if it has many tags, I tested with 1500 tags, it takes infinity to initialize, it is fast until 200-300 tags then it starts to be slow.

Kyle Hayes · Answer 31 · Wed Feb 15 2023 12:26:03 GMT+0800 (China Standard Time)

In my tests, I saw that initializing all the tags takes several seconds. The time you saw is expected since the tag creation process is waiting for completion as each tag is created.

I am not sure I understand what you mean by "a great difference between reading it all at once." If you are using auto-sync, then the library will try to spread out the reads so that they are not all at the same time.

Auto-sync works for all tag types and for all PLC types. The Micro800 does not allow requests to be packed into one packet so every tag read is done one at a time. You will not be able to read 1500 tags from a Micro800 in one second. If it takes 3ms to read a tag, then 300 tags will take one second to read.

The CPU load of 25-27% is much better than the 70+% that the client was reporting. At least I think that is what you were saying earlier. Does this solve the problem? 1500 tags is a lot. Note that the library will take some CPU time. My own tests showed that my laptop was using more than a bit of CPU using just C code.

Thanks for running these tests!

VitorKawao · Answer 32 · Fri Feb 17 2023 02:59:20 GMT+0800 (China Standard Time)

I updated my application with the latest libplctag and using auto read.
It seems to be ok with the cpu usage, it is identical to my simple project, about 25%-30% at my client pc.
The auto read is very fast. It reads 1500 tags in about 300 ms.

A question, why is it too much faster then read and readasync?
Compared to read, I imagine that it waits for one read to call another one, that's why it is slower, but readasync does not need to wait and it is slower than auto read.

My client will still test there, but I hope that he is not complaining about 25-30% of cpu usage.

Thank you for the help

Kyle Hayes · Answer 33 · Fri Feb 17 2023 09:20:47 GMT+0800 (China Standard Time)

Good to hear that this is starting to work! What is the read period you are using for the tags?

I am not sure exactly why the speed is so much better, but I strongly suspect that it is due to some combination of factors:

The .Net version does a lot of memory allocation for the async version.
Each time the async version is run, there is a call from .Net through to the C DLL to start a read and at least one more to get the status. For each tag read. With the callback version, there is only a single call back out of the C DLL when a read completes.

I do not know which of these factors might be more important, but your results are very interesting!

A different language wrapper might show different results. I suspect that a C application would be different and not show as much difference.

timyhac · Answer 34 · Fri Feb 17 2023 09:45:30 GMT+0800 (China Standard Time)

My 2c is that different things are being measured. Can you post code/information on how you measured this?

Note, internally the AsyncRead piggy-backs off of the Event/callback system - it doesn't poll.

I hope that he is not complaining about 25-30% of cpu usage.

I still believe that investigating the root problem your customer is will be helfpul in understanding the real issue. CPU Usage may be a contributing factor, or even a side effect, but isn't the root problem in itself.

timyhac · Answer 35 · Thu Mar 30 2023 06:12:41 GMT+0800 (China Standard Time)

Best of luck with your project!