aesara-devs / aesara

Aesara is a Python library for defining, optimizing, and efficiently evaluating mathematical expressions involving multi-dimensional arrays.

Home Page:https://aesara.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Consider making `RandomStream`-generated `RandomVariable`s update RNGs in-place

brandonwillard opened this issue · comments

@aesara-devs/core, should we make RandomStream return RandomVariables with RandomVariable.inplace == True, instead of setting SharedVariable.default_updates on the generated RandomTypeSharedVariables?

The reason we set SharedVariable.default_update is so that the aesara.function-compiled results will generate different samples between calls, as one would expect in a normal NumPy scenario. To do that, some form of global RNG state is required, and that's provided by the RandomTypeSharedVariable objects; however, those objects need to be updated in-place each time a sample is drawn, and it's the aesara.function updates mechanism that provides this in a very general way. In the case of RandomVariable Ops, there's a RandomVariable.inplace attribute that also provides this and is considerably more efficient to use, because it removes the copy performed on the RNG before sampling. Since samples drawn from RandomStream are always intended to be updated in-place—albeit using the updates mechanisms—it seems like we should just use the Op-level in-placing and avoid the additional overhead and fundamentally problematic SharedVariable.default_updates attribute altogether.

(N.B. We have an aesara.tensor.random.rewriting.basic.random_make_inplace rewrite that replaces RandomVariable Ops with in-placed versions under the FAST_RUN compilation mode.)

The underlying problem is that by setting SharedVariable.default_update in RandomStream.gen, we're adding "state" to the RandomTypeSharedVariables we produce (i.e. state that unnecessarily associates RandomTypeSharedVariables with specific sample graphs), and this state makes it difficult to reuse existing RandomTypeSharedVariables in rewrites. Plus, it can easily lead to the introduction of old, unwanted update graphs—the end result of which is that we end up compiling and sampling from a completely different graph (i.e. than the one we intended) just to update a shared RNG object. A full illustration of the problem is provided here.

In general, we should completely remove SharedVariable.default_update, because it severely complicates more than a couple things in Aesara. The only reason we haven't removed it is due to its use in this one case.

N.B. This idea has come up once before, but I guess we didn't have an explicit issue for it.

@aesara-devs/core, should we make RandomStream return RandomVariables with RandomVariable.inplace == True, instead of setting SharedVariable.default_updates on the generated RandomTypeSharedVariables?

Yes.

The reason we set SharedVariable.default_update is so that the aesara.function-compiled results will generate different samples between calls, as one would expect in a normal NumPy scenario.

I think it's fine to break this NumPy "compatibility" since we're effectively saying that the context for a call includes the state of the RandomStream. NumPy also assumes this, the state is just updated globally.

I think it's fine to break this NumPy "compatibility" since we're effectively saying that the context for a call includes the state of the RandomStream. NumPy also assumes this, the state is just updated globally.

Luckily, the end result of this proposed change wouldn't actually change any important user-facing behavior, aside from the way that in-place RNG updating can effectively be disabled by disabling default updates via the relevant aesara.function option. Basically, with these changes, when a graph is created with in-place RNG updating enabled, it stays that way, because it's set at the Op-level and not determined by the user and their use/non-use of the updates.

Here's the primary reason such a change wasn't made earlier:

Theano was designed to be "functional", in that its graphs were expected to contain objects with more or less no state (and/or loops). More importantly, the identity of RandomVariable nodes is tied to their inputs (as is the case for all Apply nodes), and the updating RNG objects in-places complicates this situation, because two RandomVariable nodes with the same in-place updated RNG inputs aren't actually equal.

This is the design issue we would need to address.

Just to be clear, I created this issue so that we can have a record of this approach and some important considerations regarding it.

In general, this issue relates directly to many other RandomVariable topics we've discussed in this repository, but most of those did/do not consider the introduction of some form of graph-level statefulness. (Ideally, we wouldn't even consider doing something like this, but our current use of SharedVariable.default_update leads to some severe development complications that warrant such considerations.)

For instance, going back to the basics of RandomVariable, one can—and probably should—use the output RNG states from one RandomVariable node as inputs to the next RandomVariable node and very naturally describe the update process graphically; however, this introduces long chain relationships between all RandomVariable nodes, and those complicate things.

The discussion in #1251 describes the above very well, and some possible improvements to the graph-level representation of RNG updates and RandomVariable. As mentioned in the discussion, those design improvements don't directly solve some of the efficiency (e.g. the need to copy RNG states before sampling in RandomVariable.perform, which is mostly an inherited NumPy issue) or usability (e.g. #898, #738) issues that one would hope to address more easily with explicit in-placing.

N.B. The approach in #1251 does fix some design issues that could help with the usability of chained RNG outputs, though, and that's important.