benchmark forked from datalevin

extended to include datahike
munged the core bench stuff to work on local windows setup
benchmark code should be equivalent (just adapted for API nuances)

Invocation

Designed to be run from clojure cli. Assuming you have that installed and the `clojure` or `clj` command available:

clj -m bench

Invokes a somewhat lengthy processing of all the libs and query benchmarks. There should be support similar to the original for passing in specific libs to test, right now all versions are “latest” and really only resolve to what’s in the deps.edn; so there’s a difference in that respect.

Benchmarks

Median run times in ms (per extant benchmarking infrastructure):

Windows 10

JDK

openjdk version “1.8.0_222”
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_222-b10)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.222-b10, mixed mode)
4gb Xmx

Hardware

Dell XPS 15 9550

Results (Fresh DB)

If we run the bench for the first time, we get good performance. On windows the tmp path will just populate a database at c:\tmp\blah\ . The database file is about 100mb.

version	q1	q2	q3	q4	qpred1	qpred2
latest-datascript	2.7	6.4	9.3	14.8	8.4	31.4
latest-datalevin	0.79	3.5	4.6	6.7	8.1	9.6
latest-datahike-mem	10.6	22.7	35.4	48.9	24.4	46.4
latest-datahike-file	11.3	23.2	35.3	49.4	24.0	46.0

Results (Degenerate)

In the original version of the benchmark, we don’t clean up this resource, so datalevin will re-use the existing database every time, which can lead to growing the database (lots of new random entries) from a 20K person db to far more, leading to a deceptive benchmark. By the time this was pointed out, I had run the benchmark many times, which ultimately led to a 1.3gb database file. That explains the performance differential!

version	q1	q2	q3	q4	qpred1	qpred2
latest-datascript	2.3	6.2	11.3	16.8	10.3	38.0
latest-datalevin	5.4	23.6	26.8	41.6	53.3	60.4
latest-datahike-mem	12.2	25.0	35.1	47.9	23.7	46.2
latest-datahike-file	10.6	22.1	34.6	47.8	23.8	46.4

Ubuntu

Hardware (EC2 Instance)

2.3 GHz Intel Xeon® E5-2686 v4 (Broadwell) processors or 2.4 GHz Intel Xeon® E5-2676 v3 (Haswell) processors

Model	vCPU*	Mem (GiB)	Storage	Dedicated EBS Bandwidth (Mbps)	Network Performance
m4.large	2	8	EBS-only	450	Moderate

JDK

openjdk version “1.8.0_265”
OpenJDK Runtime Environment (build 1.8.0_265-8u265-b01-0ubuntu2~16.04-b01)
OpenJDK 64-Bit Server VM (build 25.265-b01, mixed mode)
4gb Xmx

Results

version	q1	q2	q3	q4	qpred1	qpred2
latest-datascript	2.9	7.9	11.6	18.4	10.5	35.5
latest-datalevin	0.99	4.5	5.7	8.7	9.9	12.0
latest-datahike-mem	16.6	34.5	53.5	74.9	33.2	58.1
latest-datahike-file	16.1	34.1	53.8	72.4	33.0	58.0

Ensure resource cleanup on windows for datalevin, since it will reuse the database (of course). Datahike and datascript didn’t experience this since they cleaned up / deleted their databases as part of the original benchmark.

About

a fork and extension of the datalevin benchmark to include datahike and work on windows

Eclipse Public License 1.0

Languages

Language:Clojure 100.0%