Efficient canonical floating-point random number generator with fully random fraction bits

Usually to produce a floating-point random number between 0 and 1, output of an integer random number generator is divided by its range. For example output of a random number generator producing integers in [0,2³²) is divided by 2³². The actual result of this division is a fixed-point number, however. Fixed-point numbers are uniformly spaced and have constant absolute error. On the other hand, the spacing of floating-point numbers depends on their magnitude and it is their relative that is (almost) constant error. Single precision floating-point numbers have a precision of 2^-149 near 0, while the precision for values in [0.5,1] is only 2^-24. The common practice of dividing a random integer by its range and storing the result in a floating-point variable inherits the worst of both worlds: the precision of only 2^-32 near zero due to fixed-point nature of the original value and the precision of 2^-24 near one due to its conversion to floating-point. Single precision floating-point format provide 24 significant bits for all normalized numbers (numbers greater than or equal to 2^-126) but the result of this common practice loses significant bits near zero and the smallest non-zero value produced (2^-32) has only one significant bit. An ideal floating-point random number generator should be able to produce all values representable by a floating-point number (between 0 and 1) with their corresponding probability. This library does exactly that.

What are the implications of not producing such small values and the precision loss?

Many algorithms use log or similar functions to map values between 0 and 1 to an infinite range. The largest value actually produced is limited by the smallest value fed into the log, which effectively truncates the tail distribution. For example the largest value that the Box-Muller algorithm can produce is 6.66 when the smallest float fed into it is 2^-32, but it is 14.37 when the smallest single precision float is fed into it. Similarly, an exponential variate is generated by taking the logarithm of a uniform variate. When the smallest value used is 2^-32, the largest standard exponential variate would be 22.18, but it is 103.28 when the smallest single precision float is used. Not only the tail distribution is truncated prematurely, but also loss of significant bits causes large gaps in the tail distribution near the truncation point. For example, in the exponential distribution, there's a gap of 0.69, just before the truncation point, where no random number would occur.

The probability of a having a number that far in the tail is very small, however. Depending on the application, such accuracy may be unnecessary.

But won't it be much slower than just dividing a random integer by its range?

No, it's not much slower. When both methods use a modern random number generator, such as Mersenne twister, to produce the random integers, it's only about 10% slower. It needs about 1 + 2^-9 32-bit random integers per single-precision float and 1 + 2^-12 64-bit random integers per double-precision float on average.

How to install it?

There's no installation required. There's only a single header file canonical_float_random.hpp and you can either copy that into your source directory or copy it into the system's include path.

How to use it?

#include <random>
#include <iostream>
#include "canonical_float_random.hpp"

int main () {
  constexpr canonical_float_random<float, std::mt19937> canonical_dist;
  std::mt19937 gen;
  std::cout << canonical_dist (gen) << std::endl;
  return 0;
}

How does it work?

Note that a canonical uniform variate is in [2^-1, 2⁰) with probability 2^-1, or in [2^-2, 2^-1) with probability 2^-2, or in [2^-3, 2^-2) with probability 2^-3, and so on and so forth. This is a geometric distribution with p=1/2. So the fraction part of the floating-point number is initialized with a uniform variate in [0.5, 1) and then it is multiplied by 2 raised to the power of a negative geometric variate. All floating-point values in the interval [0.5, 1) are evenly spaced just like fixed-point numbers.

In a uniform integer each bit is 0 or 1 with independent equal probability 1/2, therefore the position of the first non-zero bit in a uniform integer follows a geometric distribution. An n-bit integer could be zero with probability 2^-n. In this case we need to generate a new random integer and add n to the position of the first non-zero bit in the new number (repeating the procedure if necessary).

Just enough bits to randomize the fraction part is taken from the random integer and the rest is used to generate the geometric variate. For example when generating single precision floating-point numbers from 32-bit integers, 23 bits is used for the fraction and the remaining 9 bits is used to generate the geometric variate. With probability 2^-9, extra random integers is needed for the geometric variate. So most of the times only one random integer is needed per floating-point number.

DiscreteLogarithm / canonical-random-float