[Enhancement] Random.py Package Improvement
Evitcani opened this issue · comments
Background
A friend and I were looking into the avrae/d20 package. We noticed some concerning code flow with regards to how Random is being used. The package being used (random.py) is the old version of the MT algorithm which tends towards statistically lower outcomes (as compared to the 2002 or newer versions). It's also hit or miss given the lack of seeding which means we'd expect the randomness to basically end up being subsets of older rolls. (Deterministic for the purpose of statistical modeling.) Also, it's one of the slowest random algorithms.
There's lots of alternatives that are easy to use and provide more randomness and quicker throughput.
Suggested Improvement
Upgrade to a new random package to be used on lines 405 and 407 of expression.py. A package in the PCG family would likely be optimal for Avrae's usage.
Quote from the page on comparison to other algorithms:
PCG was developed by applying TestU01 to reduced-size variants,[7] and determining the minimum number of internal state bits required to pass BigCrush. BigCrush examines enough data to detect a period of 235, so even an ideal generator requires 36 bits of state to pass it. Some very poor generators can pass if given a large enough state;[8] passing despite a small state is a measure of an algorithm's quality, and shows how large a safety margin exists between that lower limit and the state size used in practical applications.
PCG-RXS-M-XS (with 32-bit output) passes BigCrush with 36 bits of state (the minimum possible), PCG-XSH-RR (pcg32() above) requires 39, and PCG-XSH-RS (pcg32_fast() above) requires 49 bits of state. For comparison, xorshift*, one of the best of the alternatives, requires 40 bits of state,[5]: 19 and Mersenne twister fails despite 19937 bits of state.[9]
This could avoid the known issues of many-zero state recovery in pre-2002 MT algorithms of which random.py uses. The best outcome for Avrae would be faster generation.
I did some digging and it appears that the implementation of MT used by CPython's random
is indeed MT2002 (as referenced at https://github.com/python/cpython/blob/main/Modules/_randommodule.c#L5). I'll look into this more later in the week.
Apologies for butting in, but these seemed relevant:
Apologies for butting in, but these seemed relevant:
Good point on the PCG generators. I only pointed it out as it seemed the best suited (there are other, older generators that could be considered in its stead).
Just read your edit with this secondary link. It appears NumPy adopted a PCG algorithm in 2019 after a long discussion. The Python group seems hesitant due to its novelty (only being a 6-7 year old algorithm, if I’m reading correctly). They know the Twister has problems which there are solutions to.
A Xorshift generator might be better as its fail states are known and tested. It has some issues that make it difficult to achieve a long period without sufficient initialization. However it is notably faster than the Twister which would achieve the speedup considerations.
This isn't my show, but one possibility might be to opportunistically defer to a package like numpy
, meaning don't make it an install requirement, but use it, if present. (This is approach I'm currently considering for dyce
.)
I think that makes a lot of sense, and if we expose numpy
as an extra dependency (i.e. installed via d20[pcg]
or similar), transparent to the user (in addition to docs, of course).
That sounds like a great plan.
Could consider the Random.org API library as an option for people who want to pay for pretty near true RNG. Additional to the NumPy option. I wouldn’t mind making a fork for it, though it would require HTTP request async requirements with cache options. Gets a bit nasty and would slow performance.
Haha! I think that's a little out of scope to bundle with d20 (especially since the package itself tries to make few assumptions about the environment it's running in re. sync/async), but if the plan is to make opportunistic usage of alternate RNG providers (e.g. numpy), it'll likely expose a randrange_impl
function that can be easily monkey-patched to provide this functionality. I'll keep that in mind during implementation.
I've done a little bit more reading into the articles/issues linked above and some articles linked to by them.
Reading list (in the order I read them):
- The Wrap-up on PCG Generators - This analysis comes to the conclusion that "there is technically no sensible reason to use a PCG generator: those without flaws are not competitive." That sounds like a pretty condemning conclusion, but given the rest of my reading list, the fact that the NumPy group decided to adopt PCG64 as their default BitGenerator, and the potential conflict of interest of the writer (being a researcher involved in the development of xoroshiro) I'm not convinced by this article.
- xoshiro/xoroshiro Generators and the PRNG Shootout - My most important takeaway from this article was the speed comparison/"quality" table, which includes the footprint, period, speed (in ns/64b), and statistical failures for the
xoroshiro128
family,PCG64DXSM
, andMT19937-64
. I'll refer back to this table later; the only remark for here is that the table does not report any failures for all 3 RNGs (except for linear complexity in the case of MT, the test defined on page 123 of the BigCrush paper). - numpy/numpy#13635 - Notably, this is where the NumPy group decides on PCG64 as the default BitGenerator. While I saw xoroshiro brought up early in the discussion, I wasn't able to find too much discussion about the merits of one over the other in this thread.
- numpy/numpy#13793 - This is the PR where the xoshiro family of generators was removed from NumPy, citing an inability to reproduce passing BigCrush with this family. (This PR in turn links to https://www.pcg-random.org/posts/xoshiro-repeat-flaws.html, which concludes that "although mathematical theory can often show specific flaws, there are no known mathematical techniques to show that a PRNG scheme is actually good," seemingly refuting the claims in the first two articles I linked).
- numpy/numpy#16313
- Vigna, S. "It is high time we let go of the Mersenne Twister"
Personal Thoughts
I can't speak for the quality of PCG vs. XorShift, and this topic seems to be of some debate in the mathematical community. What does seem to be agreed upon, at least, is that both of these are "better" than Mersenne Twister.
With regards to speed, the table I mentioned in the second linked article seems to display these results:
PRNG | Footprint (bits) | BigCrush Failures | HWD Failures | ns/64b |
---|---|---|---|---|
xoroshiro128++ | 128 | -- | -- | 0.95 |
PCG64DXSM | 128 | -- | -- | 1.44 |
MT19937 | 20032 | LinearComp | -- | 2.19 |
This table suggests a 1.52x speedup in PCG64 over MT, and 2.31x when using xoroshiro128++ (other variants excluded for brevity). However, in the context of d20, the actual time it takes to generate a random number is effectively negligible compared to the time it spends in other code (i.e. parsing/arithmetic evaluation/function call overhead/Python overhead). For example, the 2.19ns/64b presented in the above table is less than 7% of the 32.7ns it takes to generate a random number in Python (timed with $ python -m timeit -s "from random import random" "random()"
on an i9 @ 2.3GHz), and less than 0.01% of the time d20.roll("1d20")
takes (from the performance listed in the readme). When considered alongside Amdahl's Law, I believe that the speed gains from the choice of RNG is not a major factor here.
The most recent version of NumPy includes implementations of the Mersenne Twister, PCG64 (and the DXSM variant), Philox, and SFC64 (the latter two of which I have done no reading into, so I can't comment on them) - bringing this back into what this means for d20, I believe that if we were to implement some method for the user to supply which RNG to use (i.e. some interface for exposing a randrange
implementation) and defaulted to some sane implementation (MT for a thin installation, PCG64 if numpy is installed), this leaves any library consumer in a good place.
Great and thoughtful research that was quite fascinating to read. I agree with the conclusions you’ve drawn for moving forward with d20. It’d provide a great interface for the user.
My suggestion would be the links provided here within the external or code documentation to let the user make a thoughtful conclusion at their own pace and learning.
I took a stab at implementing a random.Random
with numpy.random
primitives underneath. (I'm actually kind of surprised someone else hasn't done this. If they have, I was unable to find it. There are probably improvements over my own approach. If you think of any and feel like sharing, please let me know.)
In my case, I merely import the resulting RNG
and use it like any other random.Random
implementation.
That looks pretty clean! I think it might be worth implementing getrandbits()
in your subclass as well since NumPy's BitGenerator certainly implements it, and the Random class takes advantage of it if supplied to allow for arbitrarily large randrange
(and I think it may be slightly faster in discrete cases by inspecting the implementation of _randbelow_with_getrandbits
vs _randbelow_without_getrandbits
below).
Do you know if subclassing random.Random
causes CPython to initialize the MT even if you don't use it? My intuition says that since you aren't calling super().__init__()
it won't, but I'm not super familiar with the low-level workings of classes that eventually inherit from C extensions. (Also, random.Random.__init__()
doesn't even call the super method from the C class it inherits from...)
Edit: It looks like the implementation of BitGenerator.random_raw
differs slightly from _random.getrandbits
, and I bet implementing Python logic to try and replicate this C logic would actually be slower than the implementation of _randbelow_without_getrandbits
😛.
Edit 2: Or maybe not! I bet you could replicate SystemRandom.getrandbits
using Generator.bytes
.
Do you know if subclassing
random.Random
causes CPython to initialize the MT even if you don't use it? My intuition says that since you aren't callingsuper().__init__()
it won't, but I'm not super familiar with the low-level workings of classes that eventually inherit from C extensions. (Also,random.Random.__init__()
doesn't even call the super method from the C class it inherits from...)
I didn't take any time to look into initialization. 😞 I honestly hadn't thought about its implications. 😬 Not calling super.__init__()
was probably an oversight on my part. Are you worried about performance on generator creation?
Edit: It looks like the implementation of
BitGenerator.random_raw
differs slightly from_random.getrandbits
, and I bet implementing Python logic to try and replicate this C logic would actually be slower than the implementation of_randbelow_without_getrandbits
😛.Edit 2: Or maybe not! I bet you could replicate
SystemRandom.getrandbits
usingGenerator.bytes
.
Yeah, I think replacing _urandom(numbytes)
in that implementation with numpy.random.Generator.bytes
seems pretty straightforward:
class NumpyRandom(Random):
# …
def getrandbits(self, k: int) -> int:
# Adapted from random.SystemRandom. See
# <https://github.com/python/cpython/blob/8c21941ddafdf4925170f9cea22e2382dd3b0184/Lib/random.py#L800-L806>.
if k < 0:
raise ValueError("number of bits must be non-negative")
numbytes = (k + 7) // 8 # bits / 8 and rounded up
x = int.from_bytes(self.randbytes(numbytes))
return x >> (numbytes * 8 - k) # trim excess bits
def randbytes(self, n: int) -> bytes:
return self._g.bytes(n)
Are you worried about performance on generator creation?
Curiosity more than anything to be honest; since the MT is implemented in C it's pretty lightweight (relatively) anyway. Looking at the C code, it seems like it will allocate the space for the MT state (https://github.com/python/cpython/blob/main/Modules/_randommodule.c#L103) but never initialize it (which would normally be done here, which is only called by seed
, which is overridden).
Are you worried about performance on generator creation?
Curiosity more than anything to be honest; since the MT is implemented in C it's pretty lightweight (relatively) anyway. Looking at the C code, it seems like it will allocate the space for the MT state (https://github.com/python/cpython/blob/main/Modules/_randommodule.c#L103) but never initialize it (which would normally be done here, which is only called by
seed
, which is overridden).
Also, the parent random.Random
constructor itself calls self.seed
, so I'd have to rethink my own __init__
method. This or similar gymnastics just seem icky:
class NumpyRandom(Random):
# …
def __init__(self, bit_generator: BitGenerator):
self._g = Generator(bit_generator)
super().__init__()
self._g = Generator(bit_generator)
# …
(FWIW, I'm not married to the implementation. I threw it together as a proof-of-concept over an hour or two without thinking too much about it.)
Maybe something like this?
class NumpyRandom(random.Random):
_gen: Generator
def __init__(self, bitgen_type: Type[_BitGenT], x: _SeedT = None):
self._bitgen_type = bitgen_type
super().__init__(x)
# ...
def seed(self, a: _SeedT = None, version: int = 2):
# ...
self._gen = Generator(self._bitgen_type(a))
# ...
if hasattr(numpy.random, "PCG64DXSM"): # available in numpy 1.21 and up
random_impl = NumpyRandom(numpy.random.PCG64DXSM)
I may be coming around to your proposal. I started to explore something like this:
import random
from numpy.random import default_rng
# Same stuff that numpy.random.default_rng takes as its seed argument
_NumpySeed = Union[None, int, Sequence[int], BitGenerator, Generator]
class NumpyRandom(random.Random):
_gen: Generator
def __init__(self, numpy_seed: _NumpySeed):
super().__init__(numpy_seed)
def seed(
self,
a: _RandSeed, # type: ignore
version: int = 0, # ignored
) -> None:
self._gen = default_rng(a)
# …
So basically, one could do:
from numpy.random import PCG64DXSM
anything_that_default_rng_would_take = PCG64DXSM(7)
rng = NumpyRandom(anything_that_default_rng_would_take)
# …
All good, right? Except:
# …
anything_that_default_rng_would_take = [1, 2]
default_rng(anything_that_default_rng_would_take) # this works
NumpyRandom(PCG64DXSM([1, 2])) # this works
NumpyRandom(default_rng(anything_that_default_rng_would_take)) # this works
NumpyRandom(anything_that_default_rng_would_take) # WTF?
Results in:
Traceback (most recent call last):
File "/…….py", line …, in <module>
NumpyRandom(anything_that_default_rng_would_take) # WTF?
TypeError: unhashable type: 'list'
🤦♂️
So there's obviously some magic fighting against Random
living up to its own marketing as a derivable interface. To be fair, your proposal effectively side-steps this issue because the first argument to its __init__
is a class, which is always hashable. I hate that one has to engage in such shenanigans, though….
Interesting! I think the implementation of NumpyRandom(anything_that_default_rng_would_take)
creates a foot-gun, though - since this passes it directly to NumpyRandom.seed
to initialize the backing generator, if a user with no understanding of this interaction calls rng.seed(123)
, it's possible that they could be changing the implementation of the BitGenerator from an explicitly defined one to whatever happens to be exposed by default_rng
.
If you wanted to avoid passing the class as an argument, one could store the bitgen class as a classvar:
class NumpyRandom(random.Random, abc.ABC):
_gen: Generator
_bitgen_type: Type[BitGenerator] # to be implemented by subclasses
def __init__(self, x: _SeedT = None):
super().__init__(x)
# ...
class PCG64DXSMRandom(NumpyRandom):
_bitgen_type = numpy.random.PCG64DXSM
# ...
But I'm uncertain that doing that really adds much of value; I think it just boils down to code style preferences at that point.
You're right, of course. There's another foot gun, in that NumPy "seeds" can be stateful:
>>> from numpy.random import default_rng, PCG64DXSM
>>> seed = PCG64DXSM()
>>> generator = default_rng(seed)
>>> state = seed.state
>>> generator.random()
0.8390240576385848
>>> seed.state == state
False
>>> seed.state = state
>>> generator.random()
0.8390240576385848
So if you keep a reference to the BitGenerator outside of the NumPyRandom
object, you can manipulate how the latter behaves (and use of the latter will change the state of the thing referenced). That and the case you point out makes for some sharp edges that don't justify their presence.
I like the subclass thing. I took a stab (and solved the non-hashable seed problem) here.