Benchmark not measuring what you expect

Question

Benchmark not measuring what you expect

chris0e3 opened this issue 2 years ago · comments

Hello,

Thanks for writing wyhash.

It appears to me that benchmark.cpp in not really measuing what you/I expect/think.
It calls std::string::size() for every hash call. This can be relatively expensive (probably due to the short-string optimisation). [The mean length of words in /usr/share/dict/words is ~10, which is a short-string on 64-bit.]

If I run your unmodified benchmark.cpp on my (aging) i7 Mac I get something like:

|wyhash		|122.21		|13.28		|16.87		|

If I modify benchmark.cpp to first build then use a vector of std::string_view to avoid calling std::string::size() I get:

|wyhash		|168.38		|13.10		|16.89		|

This is about 37% more short hashes/µs!

Both were compiled with Clang using:

c++ benchmark.cpp -o benchmark -O3 -Wall -std=c++17 -march=corei7 -mavx -fno-stack-protector

But -std=c++11 works equally well.

Below is the (short) patch I used to change this.
The section from line 50 - 60 is less essential, but does improve the bulk results.

--- benchmark.cpp	2022-11-03 05:55:09.000000000 +0000
+++ benchmark_new.cpp	2022-11-03 22:23:23.000000000 +0000
@@ -19 +19 @@
-    size_t operator()(const string &s, uint64_t seed)const
+    size_t operator()(string_view s, uint64_t seed)const
@@ -22 +22 @@
-        return wyhash(s.c_str(),s.size(),seed,_wyp);
+        return wyhash(s.data(),s.size(),seed,_wyp);
@@ -24 +24 @@
-        return XXH3_64bits_withSeed(s.c_str(),s.size(),seed);
+        return XXH3_64bits_withSeed(s.data(),s.size(),seed);
@@ -29 +29 @@
-vector <string> v;
+vector<string_view> v;
@@ -50 +50 @@
-    s.resize(1<<8);
+    s.resize(1<<8); string_view sv(s);
@@ -53 +53 @@
-        dummy+=h(s,r);
+        dummy+=h(sv,r);
@@ -57 +57 @@
-    s.resize(1<<16);
+    s.resize(1<<16); sv = s;
@@ -60 +60 @@
-        dummy+=h(s,r);
+        dummy+=h(sv,r);
@@ -85,0 +86 @@
+    vector<string> sv;
@@ -88 +89 @@
-            v.push_back(s);
+            sv.push_back(s);
@@ -91,2 +92,3 @@
-    for(size_t i=v.size()-1; i; i--)
-        swap(v[i],v[rand()%(i+1)]);
+    for(size_t i=sv.size()-1; i; i--)
+        swap(sv[i],sv[rand()%(i+1)]);
+    v.assign(sv.cbegin(), sv.cend());

BTW If I’m reading the code correctly, the GB/s in the bulk results seems to be 1000 * 1024 * 1024. Which is not a GB or a GiB, but a thousand MB.

Please feel free to use this patch how & if you like.

PS I also found that adding something like the following immediately above line 99 #ifndef XXH3 to be useful:

#ifdef REPS
	for (int repeat = REPS; repeat > 0; --repeat)
#endif

It allows adding something like -DREPS=5 to the compile command to ease testing.