bitFlip is a C++ library that has a singular purpose:
The fast reversal of the order of bits in an arbitrary std::integer.
It was created to evaluate and benchmark different approaches to this simple task
- benchmarked with google benchmark
- unit tested with gtest
In order to use it you must have:
- a machine that has a processor that supports intel avx extensions
- a version of nasm installed
- a gcc compiler that supports Intel intrinsics
Advanced Vector Extensions as defined by Wikipedia on the Wikipedia Article
Advanced Vector Extensions (AVX) are extensions to the x86 instruction set architecture for microprocessors from Intel and AMD proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge[1] processor shipping in Q1 2011 and later on by AMD with the Bulldozer[2] processor shipping in Q3 2011. AVX provides new features, new instructions and a new coding scheme.
bitFlip uses a number of different approaches to flip those bits
- [Assembly] - win32 / elf. Compile with nasm - this routine is the fastest - just
- [Intel Implicit AVX] - Using the handy gcc routines
- [Naive approach] - A fun way to visualize a solution (also nice and terse)
- [Table approach] - Let's see how the compiler optimizes these
- [Lambda Syntax] - Just gotta try it out
- [Scala equivalent] - Look at the sveltness of the code - and the how fairs the JVM?
As a way to see how fast avx instruction can be. I also include the popcnt instruction which requires no lashing together of assmebly. This is benchmarked at
bitFlip has both a cmake installer and makefiles. The cmake installation will detect if you require google benchmark and gtest and will download and installed if required
$ mkdir cmake-build-release
$ cd cmake-build-release
$ cmake ..
cmake .. --help-usagecmake -L | awk '{if(f)print} /-- Cache values/{f=1}'
Specify --help for usage, or press the help button on the CMake GUI.
CMAKE_BUILD_TYPE:STRING=Release
CMAKE_GNUtoMS:BOOL=OFF
CMAKE_INSTALL_PREFIX:PATH=C:/Program Files (x86)/bitFlip
NASM_EXE:FILEPATH=C:/msys2/mingw64/bin/nasm.exe
TESTAPP_ASMFORMAT:STRING=win64
TESTAPP_USE_ASM:BOOL=true
TESTAPP_USE_BENCHMARK:BOOL=true
TESTAPP_USE_TABLEHEADERS:BOOL=true
benchmark_ROOT_DIR:PATH=
shlwapi_ROOT_DIR:PATH=
example:
mkdir cmake-build-release
cmake .. -G 'MSYS Makefiles'
make
$ make help
This makefile supports the following configurations:
MinGW64_Release MinGW64_Debug Arch_Release Arch_Debug
and the following targets:
build (default target)
clean
clobber
all
help
Makefile Usage:
make [CONF=<CONFIGURATION>] [SUB=no] build
make [CONF=<CONFIGURATION>] [SUB=no] clean
make [SUB=no] clobber
make [SUB=no] all
make help
example (for MinGW):
$ make CONF=MinGW64_Release
tests/./flipbits --help
example:
tests/./flipBits.exe -o test.txt -f I 0 100
saves as flipped decimal representations to test.txt
tests/./TestApp
Prints out bit flipped integers (as large as BigInt) from a range [FROM] [TO]
to run: open sbt and execute: sbt:bitflip> run [FROM] [TO]
$ sbt
Listening for transport dt_socket at address: 5005
[info] ...
sbt:bitflip> run 1000000000010 1000000000020
output:
[debug] Waiting for thread run-main-0 to terminate.
...
1000000000010-->343742425879
1000000000011-->893498239767
1000000000012-->206303472407
1000000000013-->756059286295
1000000000014-->481181379351
1000000000015-->1030937193239
1000000000016-->34504780567
1000000000017-->584260594455
1000000000018-->309382687511
1000000000019-->859138501399
1000000000020-->171943734039
[debug] Thread run-main-0 exited.
$ tests/./flipBits -f I 1000000000010 1000000000020
Google Benchmark to iterate over 10240 byte reversals
Run on (8 X 3292 MHz CPU s)
11/10/17 10:12:26
----------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------
BM_Flip_AVX 215 ns 214 ns 2986667 <- 8bit
BM_Flip_AVX16 2548 ns 2511 ns 280000
BM_Flip_IntrAVXArr_VecClass 772 ns 767 ns 896000
BM_Flip_IntrAVXVec_VecClass 778 ns 785 ns 896000
BM_Flip_IntrAVXClass_ptr 412 ns 414 ns 1659259 <-64bit The Winner
BM_Flip_IntrAVXClassNullBuffer 678206 ns 662667 ns 896
BM_Flip_IntrAVX64 356 ns 353 ns 2036364
BM_Flip_IntrAVX64i256 746 ns 753 ns 1120000
BM_Flip_Table16 3029 ns 2999 ns 224000
BM_Flip_Table32 3812 ns 3850 ns 186667
BM_Flip_Naive 9830 ns 9766 ns 64000
BM_Flip_Naive64 11952 ns 11963 ns 64000
BM_Flip_Mask 16748 ns 16497 ns 40727
BM_Flip_lloop 25314 ns 24902 ns 26353
BM_Flip_NaiveArrayll 26762 ns 26228 ns 28000
BM_Flip_NaiveLambda 25067 ns 25181 ns 23579
BM_popcntWegner 30923 ns 30692 ns 22400
BM_popcnt_wegner_lambda 128452 ns 128348 ns 5600
BM_popcnt_SWAR64 4036 ns 4018 ns 186667
BM_popcnt16 18439 ns 18415 ns 40727
BM_popcnt32 8273 ns 8196 ns 89600
BM_popcnt64 5287 ns 5312 ns 100000
BM_popcntIntrinsic 2 ns 2 ns 320000000
ScalaTest Execution times to iterate 10240
[info] - bitFlip inline benchmark with warming
[info] Run completed in 6 seconds, 165 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
Please feel free...
MIT