Unexpected CPU load behavior on different publishing rates
ernestum opened this issue · comments
Expecting the CPU load to grow monotonically with the number of messages I publish per second, I made this plot consisting of around 18k samples:
I did not expect this.
Code to reproduce (on a machine that does not do any other work):
import signal
import time
import psutil
import random
from zerocm import ZCM
zcm = ZCM("")
if not zcm.good():
exit()
signal.signal(signal.SIGINT, lambda sig, frame: exit())
msg = some_zcm_message()
while True:
frequency = random.randint(10, 10000)
for _ in range(3):
psutil.cpu_percent(interval=0)
t0 = time.time()
while time.time() - t0 < 3:
zcm.publish("test_channel", msg)
time.sleep(1/frequency)
cpu_usage = psutil.cpu_percent(interval=0)
print("{}\t{}".format(cpu_usage, frequency))
print("\n")
Note that I just blindly wait for the computed period, ignoring the time of publishing. This means my actual publishing frequency might not match the target frequency. More in-depth tests should also record the actual publishing frequency.
Along with the tests, I would like to introduce a collection of benchmark scripts which allow us to evaluate whether some new feature or improvement of implementation detail changed the performance. Would you be interested in a PR for that? @jbendes @olsoni ?
This sounds amazing. Actual benchmark data and feature checking sounds incredible.
OK I think we will need some standardized infrastructure to make any benchmarks comparable. E.g. we never know how much other load is on those travis machines so we need dedicated hardware. I am experimenting with an older i7 machine that is used as a dedicated gitlab runner for our internal ZCM fork. However I would love to get the benchmarks upstream and connect some dedicated hardware to the upstream repo to make everything more streamlined. I will have a look at what github can do here ...
This is to be expected. Simply sleeping after a publish call guarantees that you will have this shape of a target/expected frequency curve. The publish call takes nonzero amount of time which means that you're not actually ensuring that each loop consumes the same amount of time. You should grab the time on each loop and sleep for whatever amount of time you need to to ensure a steady publish frequency. As for cpu load, the most work we've ever done for determining this is looking at htop :) so you're in uncharted territory.
I started with htop too but then found it to be not very reliable ...
I looked into the CI issue for benchmarks and could not find any automatic solution except for using gitlab enterprise which I do not have ... so for now I will run the benchmarks from our internal fork and then publish the results somehow.
Sounds good to me. As long as we post the full comparison and exact hardware version I don't think it much matters where we test it for now
Seeing these plots i would suggest a simple sanity check to rule out any python black magic that may distort the results: Replacing the the publish
call in the benchmark by an expression with fixed runtime cost (maybe a simple busy-wait suffices) should yield a load proportional to the frequency (i.e. the ideal case for the publishing behavior).
Can also just test it in julia or c which should be very fast