Can't compute block entropy when k > 31

Question

Can't compute block entropy when k > 31

kjaquier opened this issue 5 years ago · comments

This seems like an overflow problem where the base b is multiplied k times without any check in block_entropy.c.

I got the issue using PyInform's block entropy function, but the issue clearly seems to be due to Inform.

Code:

from pyinform.blockentropy import block_entropy

x = (np.random.random([100]) > .5).astype(np.uint8)
for k in range(1, 50):
    print(k, block_entropy(x, k))

Output:

1 0.9953784388202257
2 1.9878129812393763
...
31 6.129283016944973

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-311-af5262631fbd> in <module>
      3 x = (np.random.random([100]) > .5).astype(np.uint8)
      4 for k in range(1, 50):
----> 5     print(k, block_entropy(x, k))

~\AppData\Roaming\Python\Python36\site-packages\pyinform\blockentropy.py in block_entropy(series, k, local)
    109         _local_block_entropy(data, c_ulong(n), c_ulong(m), c_int(b), c_ulong(k), out, byref(e))
    110     else:
--> 111         ai = _block_entropy(data, c_ulong(n), c_ulong(m), c_int(b), c_ulong(k), byref(e))
    112 
    113     error_guard(e)

OSError: exception: access violation writing 0x00000217586956EC

A better solution would be to use the biggest int type available, or at least raise an appropriate error message.

Obviously memory and computational complexity are always going to be limiting factors here. Any suggestion for working around this? Curve-fitting has been suggested here, but in my case I don't think that block entropy converges fast enough to a "fittable" curve (referring to the fact that it is supposed to converge to a straight line with a slope corresponding to the entropy rate, as k goes to infinity).

Douglas G. Moore · Answer 1 · Wed Sep 04 2019 07:07:24 GMT+0800 (China Standard Time)

@kjaquier Sorry I just saw this. You must be running on a machine with quite a bit of memory given it looks like you hit an index overflow before you ran into a memory overflow! We've fixed this problem in inform_transfer_entropy (0d50faa), but haven't propagated the changes to the rest of Inform. It'll probably take me a couple of weeks to take care of this and get it worked into PyInform.

In the meantime, there is a workaround that should postpone this error. The idea is to construct the k-history time series and then coalesce that time series to effectively reduce the base. From there you can just compute the k=1 block entropy on that to get essentially the same result (up to the usual floating-point madness). That is essentially apply inform_black_box and then inform_coalesce.

That being said, inform_black_box hasn't been propagated to PyInform yet, and both functions will likely suffer from this same index overflow problem. This Gist is a pure Python implementation should work, but keep in mind that it's only lightly tested and doesn't do any kind of error checking.

I'll try to get all of these changes rolled out as soon as possible.