Research level implementations of some SIMD algorithms.
Implementation based on the algorithm presented in this blog post
The original paper is here.
std::vector<int32_t> values
= { 0, 4, 6, 20, 40, 60, 90, 155,
1188, 2002, 2244, 2296, 3124, 3226, 3334, 4443,
10, 24, 46, 120, 140, 260, 390, 455,
2188, 3002, 4244, 5296, 6124, 6226, 6334, 6443 };
floki::sort(begin(values),end(values));
Clang 3.4 on Linux
There are some known issues with g++ and Boost SIMD. See the comments about aliasing on this issue. Until this is resolved, it won't compile without setting the following flags.
-fno-strict-aliasing -DBOOST_SIMD_NO_STRICT_ALIASING
Optional unit tests require the Bandit
Intel(R) Core(TM) i7-4770S CPU @ 3.10GHz
compiled with clang 3.4 and flags -O3 -mavx
65536 random elements in std::vector
data type | std::sort | floki::sort |
---|---|---|
int32_t | 2.99ms | 0.92ms |
float | 3.33ms | 1.10ms |
There is a CMake build file included that builds the benchmark and unit tests.
I generate the make file using clang as follows
mkdir build-floki
cd build-floki
CXX=/usr/bin/clang++-3.5 cmake ~/coding/sorting//floki/ -DCMAKE_CXX_FLAGS="-march=native -O3"
make
Be sure to use the O3
flag for performance !
If Boost SIMD is installed to a path other than /usr/local, then set BoostSIMD_INCLUDE_DIR accordingly.
The benchmark is now run using hard-coded variables (quick an dirty test!) that are two std:vector
:
- one for specifying the set of number of elements in the vector to be sorted.
- one for specifying the set of ranges in which the elements values are drawn from a uniform distribution. Each range value encode the max value in the distribution, the minimum is 0. The distribution is therefore drawn from [0;range_value], for each range_value in the
ranges
vector.
The benchmark is then run for each possible pairs of ( #elements; #range ).
The vector is checked to be sorted for verification. A dirty for useful standard ouput prints the benchmark results.