huahaiy / datalevinbench

a fork and extension of the datalevin benchmark to include datahike and work on windows

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

benchmark forked from datalevin

  • extended to include datahike
  • munged the core bench stuff to work on local windows setup
  • benchmark code should be equivalent (just adapted for API nuances)

Invocation

Designed to be run from clojure cli. Assuming you have that installed and the `clojure` or `clj` command available:

clj -m bench

Invokes a somewhat lengthy processing of all the libs and query benchmarks. There should be support similar to the original for passing in specific libs to test, right now all versions are “latest” and really only resolve to what’s in the deps.edn; so there’s a difference in that respect.

Benchmarks

Median run times in ms (per extant benchmarking infrastructure):

Windows 10

JDK

  • openjdk version “1.8.0_222”
  • OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_222-b10)
  • OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.222-b10, mixed mode)
  • 4gb Xmx

Hardware

  • Dell XPS 15 9550

Results (Fresh DB)

If we run the bench for the first time, we get good performance. On windows the tmp path will just populate a database at c:\tmp\blah\ . The database file is about 100mb.

versionq1q2q3q4qpred1qpred2
latest-datascript2.76.49.314.88.431.4
latest-datalevin0.793.54.66.78.19.6
latest-datahike-mem10.622.735.448.924.446.4
latest-datahike-file11.323.235.349.424.046.0

Results (Degenerate)

In the original version of the benchmark, we don’t clean up this resource, so datalevin will re-use the existing database every time, which can lead to growing the database (lots of new random entries) from a 20K person db to far more, leading to a deceptive benchmark. By the time this was pointed out, I had run the benchmark many times, which ultimately led to a 1.3gb database file. That explains the performance differential!

versionq1q2q3q4qpred1qpred2
latest-datascript2.36.211.316.810.338.0
latest-datalevin5.423.626.841.653.360.4
latest-datahike-mem12.225.035.147.923.746.2
latest-datahike-file10.622.134.647.823.846.4

Ubuntu

Hardware (EC2 Instance)

  • 2.3 GHz Intel Xeon® E5-2686 v4 (Broadwell) processors or 2.4 GHz Intel Xeon® E5-2676 v3 (Haswell) processors
ModelvCPU*Mem (GiB)StorageDedicated EBS Bandwidth (Mbps)Network Performance
m4.large28EBS-only450Moderate

JDK

  • openjdk version “1.8.0_265”
  • OpenJDK Runtime Environment (build 1.8.0_265-8u265-b01-0ubuntu2~16.04-b01)
  • OpenJDK 64-Bit Server VM (build 25.265-b01, mixed mode)
  • 4gb Xmx

Results

versionq1q2q3q4qpred1qpred2
latest-datascript2.97.911.618.410.535.5
latest-datalevin0.994.55.78.79.912.0
latest-datahike-mem16.634.553.574.933.258.1
latest-datahike-file16.134.153.872.433.058.0

Ensure resource cleanup on windows for datalevin, since it will reuse the database (of course). Datahike and datascript didn’t experience this since they cleaned up / deleted their databases as part of the original benchmark.

About

a fork and extension of the datalevin benchmark to include datahike and work on windows

License:Eclipse Public License 1.0


Languages

Language:Clojure 100.0%