Possible degradation to 0 in random math

Question

Possible degradation to 0 in random math

SChernykh opened this issue 6 years ago · comments

int src1 = kiss99(prog_rnd) % PROGPOW_REGS;
int src2 = kiss99(prog_rnd) % PROGPOW_REGS;

If src1 == src2 and we do XOR, result will be 0. This 0 will most likely spread because

0 * b = 0, a * 0 = 0
mul_hi(0, b) = 0, mul_hi(a, 0) = 0
ROTL32(0, b) = 0
ROTR32(0, b) = 0
0 & b = 0, a & 0 = 0
min(0, b) = 0, min(a, 0) = 0

The fix is to never do math operations that cancel out both arguments to 0. As far as I can see, it's only XOR currently. ASIC can add optimizations for the case when one of the numbers is 0.

Moreover, the case when src1 == src2 allows many other optimizations for ASIC that OpenCL/GPU won't do. It can use squarer instead of full multiplier for multiplication (more energy efficient), MIN/AND/OR simply become NOP, ADD becomes SHL by 1, CLZ/POPCOUNT become 2 times simpler/energy efficient. OpenCL compiler, on the other hand, is not guaranteed to take advantage of this. Compiler will be able to remove MIN/AND/OR from the generated code if src1 == src2, but it's unlikely to do more.

Again, the fix for all this is simple: never do math on the same register, always use two different registers:

int src_index = kiss99(prog_rnd) % (PROGPOW_REGS * (PROGPOW_REGS - 1));
int src1 = src_index % PROGPOW_REGS; // 0 <= src1 < PROGPOW_REGS
int src2 = src_index / PROGPOW_REGS; // 0 <= src2 < PROGPOW_REGS - 1

// src2 is not the final index yet
// Example: if we have 5 registers and src1 = 1, src2 = 3
// src1: 0 _1_ 2 3 4
// src2 = 3, but it's an index in the list of remaining registers: 0 2 3 _4_
// so the final index for src2 will be 4 = 3 + 1

if (src2 >= src1) ++src2; // 0 <= src2 < PROGPOW_REGS and src2 != src1

Paweł Bylica · Answer 1 · Tue Dec 04 2018 19:44:07 GMT+0800 (China Standard Time)

I like the proposed solution but int src_index = kiss99(prog_rnd) % (PROGPOW_REGS * (PROGPOW_REGS - 1)); is not uniformly distributed - numbers from 0 to 127 have slightly higher probability.

SChernykh · Answer 2 · Tue Dec 04 2018 19:53:02 GMT+0800 (China Standard Time)

It was just an example of how to generate two different random numbers, I didn't mean to use it as it is. Since this code is only used for generation, straight-forward approach "generate src1, src2 randomly, repeat if they are the same" will also work.

Paweł Bylica · Answer 3 · Tue Dec 04 2018 19:59:05 GMT+0800 (China Standard Time)

I like this solution because it is nicely optimized by a compiler and does not have a branch.

Another proposed one was:

    unsigned r;
    do {
        r = rand();
    }    
    while ((r & 31) == ((r>>16) & 31));

    return {r & 31, (r>>16) & 31};

I don't like it though because of the loop used. We will have to find test cases where the loop is executed more than once and include them in the test vectors.

SChernykh · Answer 4 · Tue Dec 04 2018 20:03:53 GMT+0800 (China Standard Time)

If kiss99 passes TestU01 as said in documentation, non-uniformness is not really a problem. The bias will be very small, something like (32*31)/2^32 ~= 2.3*10^-7

P.S. First 128 combinations will be chosen with probability 4329605/2^32, the rest 864 combinations will be chosen with probability 4329604/2^32. Not a big deal.

ifdefelse · Answer 5 · Thu Dec 06 2018 07:22:00 GMT+0800 (China Standard Time)

This is a good catch, thought not a significant issue.

First the OpenCL/CUDA compilers are extremely good and definitely do the transforms described above. They definitely do things like min(a,a)=a or popc(a)+popc(a)=popc(a)<<1. The only way an ASIC would have an advantage would be if it had a dedicated squaring unit. However it would only be used for 1/32*1/11=0.3% of random math ops, so doesn't seem worth the area.

Second the case of a^a=0 (or a|b=0xffffffff) is not a problem. The result of math() directly feeds a merge(), which was specifically designed to handle the B input being low entropy by XOR or ADDing in the B input. The mix state will not catastrophically collapse to 0 as in the above example.

That said there's no reason not to fix this. You suggestion for generating src1/src2 from a single random call seems reasonable. I agree that a 1/2^32 bias doesn't make any meaningful difference.

At the same time we should also update the rotate amount in merge() from ((r >> 16) % 32) to (((r >> 16) % 31)+1) to prevent rotating by 0 which is a NOP.

ifdefelse · Answer 6 · Tue Dec 11 2018 00:50:46 GMT+0800 (China Standard Time)

V0.9.2 fixes this both src1==src2 and rotate by 0. Thanks for this suggestion.
commit 824cd79