Benchmark not measuring what you expect
chris0e3 opened this issue · comments
Hello,
Thanks for writing wyhash.
It appears to me that benchmark.cpp
in not really measuing what you/I expect/think.
It calls std::string::size()
for every hash call. This can be relatively expensive (probably due to the short-string optimisation). [The mean length of words in /usr/share/dict/words is ~10, which is a short-string on 64-bit.]
If I run your unmodified benchmark.cpp
on my (aging) i7 Mac I get something like:
|wyhash |122.21 |13.28 |16.87 |
If I modify benchmark.cpp
to first build then use a vector
of std::string_view
to avoid calling std::string::size()
I get:
|wyhash |168.38 |13.10 |16.89 |
This is about 37% more short hashes/µs!
Both were compiled with Clang using:
c++ benchmark.cpp -o benchmark -O3 -Wall -std=c++17 -march=corei7 -mavx -fno-stack-protector
But -std=c++11
works equally well.
Below is the (short) patch I used to change this.
The section from line 50 - 60 is less essential, but does improve the bulk
results.
--- benchmark.cpp 2022-11-03 05:55:09.000000000 +0000
+++ benchmark_new.cpp 2022-11-03 22:23:23.000000000 +0000
@@ -19 +19 @@
- size_t operator()(const string &s, uint64_t seed)const
+ size_t operator()(string_view s, uint64_t seed)const
@@ -22 +22 @@
- return wyhash(s.c_str(),s.size(),seed,_wyp);
+ return wyhash(s.data(),s.size(),seed,_wyp);
@@ -24 +24 @@
- return XXH3_64bits_withSeed(s.c_str(),s.size(),seed);
+ return XXH3_64bits_withSeed(s.data(),s.size(),seed);
@@ -29 +29 @@
-vector <string> v;
+vector<string_view> v;
@@ -50 +50 @@
- s.resize(1<<8);
+ s.resize(1<<8); string_view sv(s);
@@ -53 +53 @@
- dummy+=h(s,r);
+ dummy+=h(sv,r);
@@ -57 +57 @@
- s.resize(1<<16);
+ s.resize(1<<16); sv = s;
@@ -60 +60 @@
- dummy+=h(s,r);
+ dummy+=h(sv,r);
@@ -85,0 +86 @@
+ vector<string> sv;
@@ -88 +89 @@
- v.push_back(s);
+ sv.push_back(s);
@@ -91,2 +92,3 @@
- for(size_t i=v.size()-1; i; i--)
- swap(v[i],v[rand()%(i+1)]);
+ for(size_t i=sv.size()-1; i; i--)
+ swap(sv[i],sv[rand()%(i+1)]);
+ v.assign(sv.cbegin(), sv.cend());
BTW If I’m reading the code correctly, the GB/s
in the bulk
results seems to be 1000 * 1024 * 1024
. Which is not a GB or a GiB, but a thousand MB.
Please feel free to use this patch how & if you like.
PS I also found that adding something like the following immediately above line 99 #ifndef XXH3
to be useful:
#ifdef REPS
for (int repeat = REPS; repeat > 0; --repeat)
#endif
It allows adding something like -DREPS=5
to the compile command to ease testing.