benchmark time period

Question

benchmark time period

gperciva opened this issue 4 years ago · comments

Do you have a particular intuition behind taking a particular time range (such as 50 to 60 seconds for bulk_update)?

In the attached pngs, it looks like the number of operations per second in bulk_update are randomly distributed. Here's 3 tests (I cancelled the last one a little bit early).

I'd be tempted to use the median, or the 25% & 75% quadrants, rather than the mean of a specific time range.
(As it happens, I spent the past 2 days working on perftests for spiped, so I have this in my mind.)

Graham Percival · Answer 1 · Tue Aug 11 2020 05:52:22 GMT+0800 (China Standard Time)

BTW, those benchmarks were done on my freebsd desktop. Would it be helpful if I used some standard EC2 hardware, like C6g.medium or c5.large?

Colin Percival · Answer 2 · Tue Aug 11 2020 07:24:48 GMT+0800 (China Standard Time)

My concern with benchmarks is "warming up" -- you can see in those graphs that the performance in the first second is higher than later, presumably because data structures are clean and kvlds isn't being slowed down by needing to evict pages from memory. How long this warmup period takes will depend on the benchmark, so I went with a conservative value.

I'm not expecting to run these benchmarks very often -- they exist mainly for comparing between versions -- so I'm not too concerned about them taking a while to run.

Colin Percival · Answer 3 · Tue Aug 11 2020 07:25:18 GMT+0800 (China Standard Time)

And yes, for comparison purposes these need to run on standard hardware. But no rush right now.

Graham Percival · Answer 4 · Tue Aug 11 2020 08:12:28 GMT+0800 (China Standard Time)

Sure, but the means of 50 to 60 are quite far in the first two examples:

$ awk '{if ((NR >= 50) && (NR < 60)) sum+=$0} END {print sum/10}' foo.txt 
266913
$ awk '{if ((NR >= 50) && (NR < 60)) sum+=$0} END {print sum/10}' bar.txt 
322541

whereas the medians of 10 to 60 are closer (although admittedly not as close as I was expecting).

$ awk '{if ((NR >= 10) && (NR < 60)) a[i++] = $1} END {print a[int(i/2)]}' foo.txt 
294197
$ awk '{if ((NR >= 10) && (NR < 60)) a[i++] = $1} END {print a[int(i/2)]}' bar.txt 
335462

Colin Percival · Answer 5 · Tue Aug 11 2020 08:19:56 GMT+0800 (China Standard Time)

Did you forget a sort when calculating the medians?

Graham Percival · Answer 6 · Tue Aug 11 2020 08:36:14 GMT+0800 (China Standard Time)

Oops. Yeah, that gives much more similar values. Invoking a useless cat since it adds clarity:

$ cat foo.txt | awk '{if ((NR >= 10) && (NR < 60)) print $1}' | sort | awk '{a[i++]=$1} END {print a[int(i/2)]}'
324656
$ cat bar.txt | awk '{if ((NR >= 10) && (NR < 60)) print $1}' | sort | awk '{a[i++]=$1} END {print a[int(i/2)]}'
329101

Graham Percival · Answer 7 · Tue Aug 11 2020 08:41:31 GMT+0800 (China Standard Time)

BTW, c5.large produces the same type of timing data:

Colin Percival · Answer 8 · Tue Aug 11 2020 10:12:06 GMT+0800 (China Standard Time)

Sounds good to me.

BTW the low performance on c5.large is because reading the clock is ridiculously slow. Or rather, it is with the default settings -- adjusting the timecounter used on FreeBSD speeds things up dramatically. I need to dig into that at some point.

Graham Percival · Answer 9 · Tue Aug 11 2020 10:42:14 GMT+0800 (China Standard Time)

Do you mean "the default clock method used by monoclock.c is slow on c5.large", or do you mean "there's something in the kernel that's sub-optimal"?

FWIW, c6g.large has three times the operations with bulk_update.

Colin Percival · Answer 10 · Tue Aug 11 2020 10:59:25 GMT+0800 (China Standard Time)

The FreeBSD kernel has a setting which tells is where to get the time from, and the default is suboptimal, at least for x86.