unmark — Painless micro-benchmarks

%%VERSION%%

WORK-IN-PROGRESS. API COULD CHANGE.

Unmark is a benchmarking library with a focus on the small end of the scale.

Its essential purpose is to verify assumptions about the influence of certain programming patterns on program efficiency, answering questions like "Is avoiding this allocation worthwhile?", "Is this indirect call really bad?", or "Is the loop thrashing the cache?"

It can also be used to establish the performance baselines, and to track the evolution of performance in a given code base.

Special attention is devoted to evolving the benchmark code. It is easy to save baselines, add ad-hoc benchmarks, or reuse poorly organized benchmarking code. It is possible to compare results from a single, or across multiple runs.

Unmark is less suitable for benchmarking entire systems, and particularly unsuitable for benchmarking concurrency.

Unmark is a product of stealing great ideas from Criterion and Core_bench. As a consequence, it shares many similarities with both.

Unmark is distributed under the ISC license.

Homepage: https://github.com/pqwy/unmark

Structure

Unmark separates benchmarks definition, running and analysis:

unmark library (src/) is needed to define benchmarks.
unmark.cli library (src-cli/) is needed to create standalone programs that run them.
unmark.papi library (src-papi/) provides access to hardware performance counters using PAPI.
unmark executable (src-bin/) analyses the results and prints reports on the command-line.
src-python/ provides access to the benchmark results from Python. It is intended to be used from Jupyter.

The unmark library depends only on unix and logs. Other OCaml bits are less conservative with their dependencies.

Python code needs Python 3 and numpy. It depends on matplotlib for plotting, and it knows how to make use of scipy if installed.

Documentation

Interface files or online.

Python files contain doc-strings, accessible from ipython with ??unmark.

Examples

Hello world

(* shebang.ml *)
open Unmark

let suite = [
  bench "eff" f;
  group "gee" [ bench "eye" i; bench "ohh" o ]
]

let () = Unmark_cli.main "The Whole Shebang" suite

$ ./shebang
$ ./shebang --help

Hello world, now with a mess

Show only time:

$ ./shebang -- --counters time

Do a baseline run, saving the results:

$ nice -n -10 ./shebang --out bench.json --note 'when it worked'

Change the implementations. Decide it's too much work to get both versions into the same executable. Instead run the benchmarks again, appending to the results:

$ ./shebang --out bench.json --note 'did stuff'

Show everything so far:

$ unmark < bench.json

Intensely work on the functions i and o. Nothing else changed, so run just the group containing them, appending to the results:

$ ./shebang --filter gee --out bench.json --note turbo

Show gee across all three runs:

$ unmark --filter gee < bench.json

Change again. Compare the last run with a throwaway run of gee:

$ (tail -n1 bench.json && ./shebang --filter gee --out) | unmark

Plugging in measurements

More details about the GC:

open Unmark
let () =
  Unmark_cli.main "Gulp" [ ... ]
    ~probe:Measurement.Probe.gc_counters

Hardware counters (cache misses):

open Unmark
let () =
  Unmark_cli.main "Tikk tokk" [ ... ]
    ~probe:(Unmark_papi.of_events Papi.[L1_TCM; L2_TCM; L3_TCM])

Python

The environment needs to point to the python code:

PYTHONPATH="$(opam var unmark:share)/python:${PYTHONPATH}" jupyter-notebook

Then start with

%pylab inline
from unmark import *

runs = of_json_file ('bench.json')

Inspect the fit:

r0 = runs[0]
r0.filter ('gee').reglines ()

eff = r0['eff']
eff.reglines ()

Do a clever analysis:

mx = eff['iterations', 'time', 'L1_TSC']
secret_sauce (mx)

... and open an issue to share it. ;)

pqwy / unmark