This project is a small header file that generate constants and code to deinterleave Morton numbers, also known as Z-order curves, with an arbitrary number of dimension, in an arbitrary sized integer.
Project structure:
zorder.h
: main header file which contains the deinterleave function.benchmark
: binary target which generates a small executable that compares the speed ofstd::div
vsdeinterleave
.demo
: minimal binary target, which just callsdeinterleave_one
on the standard input. Can be used to inspect easily the assembly generated.
To make the program as efficient as possible, all the constants and the recursive functions are generated via template
metaprogramming; when compiling optimized code (e.g. using -O3
), most of the types and functions are stripped out to
leave very simple assembly code.
This has some downsides, for example, every parameter in the template classes has to be a constexpr
. I do not know an
easy workaround for this, so I just defined more classes that do some basic mathematical operations (e.g. log2
).
This is the deinterleave_one
function proposed on StackOverflow:
uint32_t morton1(uint32_t x)
{
x = x & 0x55555555;
x = (x | (x >> 1)) & 0x33333333;
x = (x | (x >> 2)) & 0x0F0F0F0F;
x = (x | (x >> 4)) & 0x00FF00FF;
x = (x | (x >> 8)) & 0x0000FFFF;
return x;
}
And this is the assembly code of deinterleave_one
in the demo
target generated by clang 3.9.0 (x86_64-apple-darwin16.1.0)
using -O3
:
100000f35: andl $0x55555555, %eax
100000f3a: movl %eax, %ecx
100000f3c: shrl %ecx
100000f3e: orl %eax, %ecx
100000f40: andl $0x33333333, %ecx
100000f46: movl %ecx, %eax
100000f48: shrl $2, %eax
100000f4b: orl %ecx, %eax
100000f4d: andl $0xF0F0F0F, %eax
100000f52: movl %eax, %ecx
100000f54: shrl $4, %ecx
100000f57: orl %eax, %ecx
100000f59: movzbl %cl, %esi
100000f5c: shrl $8, %ecx
100000f5f: andl $0xFF00, %ecx
100000f65: orl %ecx, %esi
Which looks as optimized as it gets, without using BMI instructions.
The code was inspired by this question on StackOverflow: How to efficiently de-interleave bits (inverse Morton). The code that is proposed there seems fairly efficient and I wished to generalize it to arbitrary dimensions and different integral types.
Also, I was trying to understand what is faster between a row-major order and a z-order, in terms of the runtime of the
index-to-coordinate conversion. For this reason the code includes a small competitive benchmark between std::div
and
deinterleave_all
.