Unexpected CPU load behavior on different publishing rates

Question

Unexpected CPU load behavior on different publishing rates

ernestum opened this issue 5 years ago · comments

Expecting the CPU load to grow monotonically with the number of messages I publish per second, I made this plot consisting of around 18k samples:

I did not expect this.
Code to reproduce (on a machine that does not do any other work):

import signal
import time
import psutil
import random
from zerocm import ZCM

zcm = ZCM("")
if not zcm.good():
    exit()

signal.signal(signal.SIGINT, lambda sig, frame: exit())

msg = some_zcm_message()

while True:
    frequency = random.randint(10, 10000)
    for _ in range(3):
        psutil.cpu_percent(interval=0)
        t0 = time.time()
        while time.time() - t0 < 3:
            zcm.publish("test_channel", msg)
            time.sleep(1/frequency)
        cpu_usage = psutil.cpu_percent(interval=0)

        print("{}\t{}".format(cpu_usage, frequency))
    print("\n")

Note that I just blindly wait for the computed period, ignoring the time of publishing. This means my actual publishing frequency might not match the target frequency. More in-depth tests should also record the actual publishing frequency.
Along with the tests, I would like to introduce a collection of benchmark scripts which allow us to evaluate whether some new feature or improvement of implementation detail changed the performance. Would you be interested in a PR for that? @jbendes @olsoni ?

Jonathan · Answer 1 · Wed Jun 05 2019 12:59:29 GMT+0800 (China Standard Time)

This sounds amazing. Actual benchmark data and feature checking sounds incredible.

M. Ernestus · Answer 2 · Wed Jun 05 2019 16:46:02 GMT+0800 (China Standard Time)

OK I think we will need some standardized infrastructure to make any benchmarks comparable. E.g. we never know how much other load is on those travis machines so we need dedicated hardware. I am experimenting with an older i7 machine that is used as a dedicated gitlab runner for our internal ZCM fork. However I would love to get the benchmarks upstream and connect some dedicated hardware to the upstream repo to make everything more streamlined. I will have a look at what github can do here ...

M. Ernestus · Answer 3 · Wed Jun 05 2019 16:53:31 GMT+0800 (China Standard Time)

In the meantime: here is the same plot as above but run on the i7:

This time I also recorded the actual publishing frequency so we can compare it against the target frequency:

And here we have the CPU load vs the actual publishing frequency:

Jonathan · Answer 4 · Wed Jun 05 2019 20:58:24 GMT+0800 (China Standard Time)

This is to be expected. Simply sleeping after a publish call guarantees that you will have this shape of a target/expected frequency curve. The publish call takes nonzero amount of time which means that you're not actually ensuring that each loop consumes the same amount of time. You should grab the time on each loop and sleep for whatever amount of time you need to to ensure a steady publish frequency. As for cpu load, the most work we've ever done for determining this is looking at htop :) so you're in uncharted territory.

M. Ernestus · Answer 5 · Thu Jun 06 2019 14:22:30 GMT+0800 (China Standard Time)

I started with htop too but then found it to be not very reliable ...

I looked into the CI issue for benchmarks and could not find any automatic solution except for using gitlab enterprise which I do not have ... so for now I will run the benchmarks from our internal fork and then publish the results somehow.

Jonathan · Answer 6 · Thu Jun 06 2019 21:58:28 GMT+0800 (China Standard Time)

Sounds good to me. As long as we post the full comparison and exact hardware version I don't think it much matters where we test it for now

M. Ernestus · Answer 7 · Thu Jun 13 2019 17:49:25 GMT+0800 (China Standard Time)

I wanted to make sure that the benchmarks are reproducible on the same machine. See here for the an overlay of the old data (black) over another run of the same benchmark (red):

Looks good to me ...

Jonas Schlagenhauf · Answer 8 · Wed Jul 03 2019 20:31:51 GMT+0800 (China Standard Time)

Seeing these plots i would suggest a simple sanity check to rule out any python black magic that may distort the results: Replacing the the publish call in the benchmark by an expression with fixed runtime cost (maybe a simple busy-wait suffices) should yield a load proportional to the frequency (i.e. the ideal case for the publishing behavior).

Jonathan · Answer 9 · Fri Aug 09 2019 11:59:36 GMT+0800 (China Standard Time)

Can also just test it in julia or c which should be very fast