ZeroCM / zcm

Zero Communications and Marshalling

Home Page:http://zerocm.github.io/zcm/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unexpected CPU load behavior on different publishing rates

ernestum opened this issue · comments

Expecting the CPU load to grow monotonically with the number of messages I publish per second, I made this plot consisting of around 18k samples:
publishing_load
I did not expect this.
Code to reproduce (on a machine that does not do any other work):

import signal
import time
import psutil
import random
from zerocm import ZCM

zcm = ZCM("")
if not zcm.good():
    exit()

signal.signal(signal.SIGINT, lambda sig, frame: exit())

msg = some_zcm_message()

while True:
    frequency = random.randint(10, 10000)
    for _ in range(3):
        psutil.cpu_percent(interval=0)
        t0 = time.time()
        while time.time() - t0 < 3:
            zcm.publish("test_channel", msg)
            time.sleep(1/frequency)
        cpu_usage = psutil.cpu_percent(interval=0)

        print("{}\t{}".format(cpu_usage, frequency))
    print("\n")

Note that I just blindly wait for the computed period, ignoring the time of publishing. This means my actual publishing frequency might not match the target frequency. More in-depth tests should also record the actual publishing frequency.
Along with the tests, I would like to introduce a collection of benchmark scripts which allow us to evaluate whether some new feature or improvement of implementation detail changed the performance. Would you be interested in a PR for that? @jbendes @olsoni ?

This sounds amazing. Actual benchmark data and feature checking sounds incredible.

OK I think we will need some standardized infrastructure to make any benchmarks comparable. E.g. we never know how much other load is on those travis machines so we need dedicated hardware. I am experimenting with an older i7 machine that is used as a dedicated gitlab runner for our internal ZCM fork. However I would love to get the benchmarks upstream and connect some dedicated hardware to the upstream repo to make everything more streamlined. I will have a look at what github can do here ...

In the meantime: here is the same plot as above but run on the i7:
image
This time I also recorded the actual publishing frequency so we can compare it against the target frequency:
image

And here we have the CPU load vs the actual publishing frequency:
image

This is to be expected. Simply sleeping after a publish call guarantees that you will have this shape of a target/expected frequency curve. The publish call takes nonzero amount of time which means that you're not actually ensuring that each loop consumes the same amount of time. You should grab the time on each loop and sleep for whatever amount of time you need to to ensure a steady publish frequency. As for cpu load, the most work we've ever done for determining this is looking at htop :) so you're in uncharted territory.

I started with htop too but then found it to be not very reliable ...

I looked into the CI issue for benchmarks and could not find any automatic solution except for using gitlab enterprise which I do not have ... so for now I will run the benchmarks from our internal fork and then publish the results somehow.

Sounds good to me. As long as we post the full comparison and exact hardware version I don't think it much matters where we test it for now

I wanted to make sure that the benchmarks are reproducible on the same machine. See here for the an overlay of the old data (black) over another run of the same benchmark (red):
image
Looks good to me ...

Seeing these plots i would suggest a simple sanity check to rule out any python black magic that may distort the results: Replacing the the publish call in the benchmark by an expression with fixed runtime cost (maybe a simple busy-wait suffices) should yield a load proportional to the frequency (i.e. the ideal case for the publishing behavior).

Can also just test it in julia or c which should be very fast