newspeed
- A newer OpenSSL speed
This is a WIP project reimplementing openssl speed
to use modern
APIs (e.g. EVP_PKEY
).
Table of Contents
Design
newspeed
runs a specified set of operations a number of times in a
loop and records the number of elapsed CPU cycles.
The main loop can be described through the following pseudocode:
main_benchmarking_loop(operation)
for o in operation.out_count
for i in operation.in_count
operation.pre_hook()
SAMPLING_START()
for x in iterations_per_sample
operation.run()
SAMPLING_END()
sample = SAMPLING_DELTA / iterations_per_sample
operation.post_hook()
The reason why we have two nested loops (out_count
and in_count
) is
to abide to the recommendations for some of the alternative sampling
providers.
To improve the statistical quality of each sample, we further average the measured elapsed CPU cycles over a number of iterations per sample in the innermost loop.
The total number of runs for each operation is thus equal to
operation.out_count * operation.in_count * iterations_per_sample
Even tough it is possible to redefine the number of iterations in out_count and in_count for each defined operation, currently all operation definitions default to use global parameters set at compile time to define these values.
When out_count > 1
(which is not the default), part of the code in
newspeed
will attempt a statistical comparison of the sampled values,
according to the guidelines described in the
whitepaper
which inspired one of the alternative sampling providers.
Caveats
- The list of operations supported is currently limited to
EVP_PKEY
keygenEVP_PKEY
key derivation (e.g. DH, ECDH)EVP_PKEY
sign/verifyEVP_DigestSign
/EVP_DigestVerify
- The number of iterations per operations is fixed at build time
- The current code would benefit a lot from refactoring and restructuring, especially to make it easier to add support for new operations and to maintain existing ones
- The output on the TTY is cryptic and hard to use, and improving it is part of future plans for the tool (the JSON output is what should be used for statistics)
Prerequisites
Perf
The default method for measuring elapsed cycles in newspeed
is based
on the Linux perf
tool, as it has the benefit of being portable to
any platform we were interested in.
On Ubuntu/Debian:
apt-get install linux-tools-common linux-tools-generic \
linux-tools-$(uname -r)
This is a requiremnt at build time, unless a different sampling provider is selected (alternatives can be selected through preprocessors defines, but they underwent less testing).
Also, as a run-time requirement, depending on the security environment,
using perf
as an unprivileged user might require adding
CAP_SYS_ADMIN
capability or allowing unprivileged access to perf
event counters:
echo 1 | sudo tee /proc/sys/kernel/perf_event_paranoid
Build
Build-time configuration
src/newspeed_config.h
includes defines affecting the core
functionality of newspeed
.
Number of runs for each operation
Currently the number of runs for each operation is statically determined at compile time as illustrated in the Design section, and the total number of runs for each operation is equal to:
operation.out_count * operation.in_count * iterations_per_sample
-
iterations_per_sample
depends on#define HYPERLOOP 7
iterations_per_sample = 2**HYPERLOOP # 2 to the power of HYPERLOOP
-
operation.in_count
defaults to#define OP_DEFAULT_IN_COUNT 100
for each defined operation;
-
operation.out_count
defaults to#define OP_DEFAULT_OUT_COUNT 1
for each defined operation; if greater than 1, part of the code in
newspeed
will attempt a statistical comparison of the sampled values, according to the guidelines described in the whitepaper which inspired one of the alternative sampling providers.
Alternative sampling providers
#define SAMPLING_INTEL_WHITEPAPER 1
#define SAMPLING_SUPERCOP 2
#define SAMPLING_MODIFIED_SUPERCOP 3
#define SAMPLING_PERF 4
#define SAMPLING SAMPLING_PERF
The SAMPLING
declaration selects one of the implemented sampling
providers.
The default is SAMPLING_PERF
as it is portable and more reliable on
the environments we targeted, with the caveat that it adds some
build-time and run-time requirements.
The other providers are currently x86
only, and may also require
tweaking the number of runs per operation to increase the reliability of
the measurements.
Build instructions
export OPENSSL_PREFIX=/opt/openssl-master
TMP_BUILD_DIR=./build
rm -rf ${TMP_BUILD_DIR}; mkdir -p ${TMP_BUILD_DIR}
cd ${TMP_BUILD_DIR}
cmake -DCMAKE_BUILD_TYPE=Debug \
-DOPENSSL_ROOT_DIR=${OPENSSL_PREFIX} \
-DCMAKE_INSTALL_PREFIX:PATH=${OPENSSL_PREFIX} \
${CMAKE_EXTRA_OPTS} ..
make ${MAKE_OPTS}
# Optionally install newspeed to ${OPENSSL_PREFIX}/bin/
sudo make install
Usage
$OPENSSL_PREFIX/bin/newspeed
Usage: newspeed [options]
Valid options are:
-help Display this summary
-engine val Use engine, possibly a hardware device
-evp_pkey_keygen val Benchmark EVP_PKEY key-pair generation
-evp_pkey_derive val Benchmark EVP_PKEY derive (DH) operation
-evp_pkey_dss val Benchmark EVP_PKEY digital signature scheme (sign+verify)
-evp_pkey_sign val Benchmark EVP_PKEY (DSS) sign operation
-evp_pkey_verify val Benchmark EVP_PKEY (DSS) verify operation
-evp_digest_dss val Benchmark EVP_DigestSign/DigestVerify digital signature scheme
-evp_digestsign val Benchmark EVP_DigestSign operation
-evp_digestverify val Benchmark EVP_DigestVerify operation
-json outfile Output json stats to file
-noop Benchmark noop operation
All the -evp_*
options take as argument the name of the algorithm on
which to operate, with the following syntax:
EVP_PKEY_EC:<curve_name>
performs the operation on the elliptic curve identified by the specified<curve_name>
. For a list of valid curve names:$OPENSSL_PREFIX/bin/openssl ecparam -list_curves
EVP_PKEY_RSA:<bits>
performs the operation using an RSA key with the specified bit length.EVP_PKEY_DSA:<bits>
performs the operation using a DSA key with the specified bit length.- Any other argument is directly fed through OpenSSL methods to
retrieve a corresponding
EVP_PKEY
implementation.
Most of the time -evp_digest*
and the corresponding
-evp_pkey_{sign|derive|dss}
are interchangeable, as the
EVP_DigestSign
API wraps around the EVP_PKEY
API.
The notable exception is for ED25519
signatures, which in OpenSSL
1.1.1 are implemented directly through the EVP_DigestSign
API and do
not support EVP_PKEY_sign
.
NOTE: An ENGINE
might expose a cryptosystem through different APIs
than upstream OpenSSL: e.g. with libsuola, ED25519
is
accessible through both the EVP_DigestSign
API and the EVP_PKEY_sign
API.
Notes
- NOTE: Don't load more than one engine for benchmarking!
- NOTE:
-engine
is optional, but should be the first option when loading anENGINE
!
Both recommendations are not strictly enforced, as for debugging and development it might be useful to load more engines and in different orders.
Examples
Benchmark X25519
$OPENSSL_PREFIX/bin/newspeed \
-engine libsuola-hacl \
-evp_pkey_keygen X25519 \
-evp_pkey_derive X25519 \
-json results.json
Benchmark ED25519
$OPENSSL_PREFIX/bin/newspeed \
-engine libsuola-sodium \
-evp_pkey_keygen ED25519 \
-evp_digest_dss ED25519 \
-json results.json
Benchmark NIST-P256 EC
$OPENSSL_PREFIX/bin/newspeed \
-evp_pkey_keygen EVP_PKEY_EC:prime256v1 \
-evp_pkey_derive EVP_PKEY_EC:prime256v1 \
-evp_digest_dss EVP_PKEY_EC:prime256v1 \
-json results.json
Benchmark RSA keygen
$OPENSSL_PREFIX/bin/newspeed \
-evp_pkey_keygen EVP_PKEY_RSA:2048 \
-json results.json
Benchmark DSA sign
$OPENSSL_PREFIX/bin/newspeed \
-evp_pkey_sign EVP_PKEY_DSA:1024 \
-json results.json
Benchmark several different operations at once
$OPENSSL_PREFIX/bin/newspeed \
-engine libsuola-hacl \
-evp_pkey_keygen EVP_PKEY_EC:prime256v1 \
-evp_pkey_keygen X25519 \
-evp_pkey_keygen ED25519 \
-evp_pkey_derive EVP_PKEY_EC:prime256v1 \
-evp_pkey_derive X25519 \
-evp_digest_dss EVP_PKEY_EC:prime256v1 \
-evp_digest_dss ED25519 \
-json results.json
$OPENSSL_PREFIX/bin/newspeed \
-engine libsuola-hacl \
-evp_pkey_keygen EVP_PKEY_RSA:1024 \
-evp_pkey_derive X25519 \
-evp_pkey_sign ED25519 \
-evp_digest_dss EVP_PKEY_DSA:1024 \
-evp_pkey_dss EVP_PKEY_EC:prime256v1 \
-json results.json
JSON Output
The preferred output format for newspeed
benchmarks is JSON and this
sections aims at briefly describing the adopted schema.
A newspeed
JSON results file contains only one object, describing
metadata related to the execution of newspeed
(i.e. pid, hostname,
date, openssl version, command line arguments, loaded engine).
The operations
member contains an array of objects describing each
operation benchmarked by the newspeed
execution: each operation
object has metadata describing the algorithm name, operation name, and
the out_count
, in_count
and iterations_per_sample
parameters
previously explained.
The RUNS
member of a operation
object contains an array (of length
out_count
) of objects composed by a numerical index j
(i.e.
0 < j < out_count
) and a values
array containing in_count
CPU
cycles samples (each averaged over iterations_per_sample
operation
runs).
{
"pid": 24913,
"machine": "picchiopanciagialla",
"engine": "libsuola-sodium",
"date": "2018-04-26 20:18:30 +0000",
"openssl_v": "0x10101003",
"openssl_v_txt": "OpenSSL 1.1.1-pre3 (beta) 20 Mar 2018",
"argv": [
"/opt/openssl-111-pre3/bin/newspeed",
"-engine",
"libsuola-sodium",
"-evp_pkey_derive",
"X25519",
"-evp_pkey_sign",
"ED25519",
"-json",
"test.json"
],
"operations": [
{
"bits": 253,
"alg_name": "X25519",
"op_name": "evp_pkey_derive",
"op_id": "0x17f99b0",
"out_count": 1,
"in_count": 10,
"iterations_per_sample": 128,
"RUNS": [
{
"j": 0,
"values": [
136739,
142329,
135723,
137869,
141245,
135261,
136663,
135096,
135370,
135009
]
}
]
},
{
"bits": 253,
"alg_name": "ED25519",
"op_name": "evp_pkey_sign",
"op_id": "0x1800860",
"out_count": 1,
"in_count": 10,
"iterations_per_sample": 128,
"RUNS": [
{
"j": 0,
"values": [
83261,
84846,
89588,
90641,
83401,
85825,
93010,
88041,
83294,
83383
]
}
]
}
]
}