ZOrder

This project is a small header file that generate constants and code to deinterleave Morton numbers, also known as Z-order curves, with an arbitrary number of dimension, in an arbitrary sized integer.

Project structure:

zorder.h: main header file which contains the deinterleave function.
benchmark: binary target which generates a small executable that compares the speed of std::div vs deinterleave.
demo: minimal binary target, which just calls deinterleave_one on the standard input. Can be used to inspect easily the assembly generated.

Notes

To make the program as efficient as possible, all the constants and the recursive functions are generated via template metaprogramming; when compiling optimized code (e.g. using -O3), most of the types and functions are stripped out to leave very simple assembly code.
This has some downsides, for example, every parameter in the template classes has to be a constexpr. I do not know an easy workaround for this, so I just defined more classes that do some basic mathematical operations (e.g. log2).

This is the deinterleave_one function proposed on StackOverflow:

uint32_t morton1(uint32_t x)
{
    x = x & 0x55555555;
    x = (x | (x >> 1)) & 0x33333333;
    x = (x | (x >> 2)) & 0x0F0F0F0F;
    x = (x | (x >> 4)) & 0x00FF00FF;
    x = (x | (x >> 8)) & 0x0000FFFF;
    return x;
}

And this is the assembly code of deinterleave_one in the demo target generated by clang 3.9.0 (x86_64-apple-darwin16.1.0) using -O3:

100000f35:  andl    $0x55555555, %eax
100000f3a:  movl    %eax, %ecx
100000f3c:  shrl    %ecx
100000f3e:  orl     %eax, %ecx
100000f40:  andl    $0x33333333, %ecx
100000f46:  movl    %ecx, %eax
100000f48:  shrl    $2, %eax
100000f4b:  orl     %ecx, %eax
100000f4d:  andl    $0xF0F0F0F, %eax
100000f52:  movl    %eax, %ecx
100000f54:  shrl    $4, %ecx
100000f57:  orl     %eax, %ecx
100000f59:  movzbl  %cl, %esi
100000f5c:  shrl    $8, %ecx
100000f5f:  andl    $0xFF00, %ecx
100000f65:  orl     %ecx, %esi

Which looks as optimized as it gets, without using BMI instructions.

Motivation

The code was inspired by this question on StackOverflow: How to efficiently de-interleave bits (inverse Morton). The code that is proposed there seems fairly efficient and I wished to generalize it to arbitrary dimensions and different integral types.

Also, I was trying to understand what is faster between a row-major order and a z-order, in terms of the runtime of the index-to-coordinate conversion. For this reason the code includes a small competitive benchmark between std::div and deinterleave_all.

5p4k / ZOrder

ZOrder

Notes

Motivation

About

Languages