About
I wrote these benchmarks for a presentation on "Performance Tips, Tricks, and Gotchas". They contain benchmarks to compare several ways of doing the same thing in C++ that are subtly different on the surface but may differ significantly in terms of performance. Writing these was an interesting learning opportunity for me, because I learned how to write these benchmarks in the process of doing it, and though I already knew that in principal there were performance differences between these things, I'd never actually taken the time to measure them.
Benchmarks include measurements for:
- Function call overhead: Virtual member function vs. non-virtual member function vs. lambda function vs. std::function
- Effects of data locality/cache misses
- False sharing between threads
- Using mutexes vs. atomics
This is a work in progress and there may be mistakes. There are also a few TODOs left in benchmarks.cpp that are worth paying attention to. I'll clean this up more in the following weeks.
How to Install and Run
# Install conan. Used to fetch google benchmark.
sudo apt-get install python3-venv
python3 -m venv pyenv
source pyenv/bin/activate
pip install conan
# Configure conan.
conan profile new default --detect
conan profile update settings.compiler.libcxx=libstdc++11 default
mkdir build && cd build
# This will download google benchmark
conan install ..
# Configure cmake
cmake .. -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release
# Build benchmark
cmake --build .
# Run benchmark
./bin/benchmarks
Output on my machine:
Running ./build/bin/benchmarks
Run on (16 X 3396.7 MHz CPU s)
CPU Caches:
L1 Data 32K (x8)
L1 Instruction 32K (x8)
L2 Unified 1024K (x8)
L3 Unified 25344K (x1)
Load Average: 0.00, 0.11, 0.27
----------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------------------------
BM_virtualFunctionCallsThroughPointerToParent 2.51 ns 2.51 ns 294239104
BM_virtualFunctionCallsThroughPointerToChild 1.61 ns 1.61 ns 433945778
BM_virtualFunctionCallsThroughInstanceOfChild 0.295 ns 0.295 ns 1000000000
BM_nonVirtualNonInlineFunctionCall 3.24 ns 3.24 ns 215936970
BM_inlineFunctionCall 0.295 ns 0.295 ns 1000000000
BM_noFunctionCall 0.295 ns 0.295 ns 1000000000
BM_stdFunctionCall 1.77 ns 1.77 ns 395824197
BM_lambdaFunctionCall 0.295 ns 0.295 ns 1000000000
BM_stdFunctionPassedAsParameterFunctionCall 2.06 ns 2.06 ns 339063066
BM_lambdaPassedAsParameterFunctionCall 0.295 ns 0.295 ns 1000000000
BM_sequentialListAccess 1367 ns 1367 ns 440074
BM_sequentialArrayAccess 148 ns 148 ns 4704685
BM_sequentialArrayAccessSmallerThanL1 47.8 ns 47.8 ns 14637763
BM_randomArrayAccessSmallerThanL1 99.9 ns 99.9 ns 7034840
BM_sequentialArrayAccessBiggerThanL1 158070 ns 158063 ns 4429
BM_randomArrayAccessBiggerThanL1 727313 ns 727301 ns 839
BM_falseSharing/manual_time 2841579 ns 41720 ns 246
BM_noFalseSharing/manual_time 2144634 ns 39660 ns 326
BM_useMutex/manual_time 116323172 ns 50073 ns 6
BM_useMutexNoContention/manual_time 15982798 ns 36639 ns 44
BM_useAtomic/manual_time 28326920 ns 39443 ns 25