joschu / cgt

Computation Graph Toolkit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Conditionals (if-else) and Loops

joschu opened this issue · comments

I'm creating this Issue to sketch out some ideas on how to implement conditionals and loops, which have come out of discussions with @nickjhay and @hojonathanho. These ideas will address Issue #23, among other things. This issue is not the top priority at the moment (better error messages, full GPU support, and other issues are higher on the list), but it can't hurt to start brainstorming about it now.

First, let's consider implementing an IfElse operation: if cond then x else y, where cond is boolean.
Right now, we iterate through the nodes of the computation graph in topological order and fire each node after its parent nodes.Hence, x and y would both be computed, which would be problematic if one of them would have an out-of-bounds error (as pointed out in the #23 discussion).
This issue was raised early on in Theano development (see https://github.com/Theano/Theano/blob/master/doc/proposals/conditional.txt)
and they implemented "linkers" (like CGT's interpreters) that allow for lazy execution, along with the idea of "thunks".

Implementing lazy evaluation in CGT would require changes to how the execution graph works. One possibility is to (1) create a new IfElse instruction, (2) make the interpreters work backwards from the final instruction instead of working forwards. Specifically, we'd construct a dependency graph on instructions (as we do in the ParallelInterpreter already), and then work backwards from the final results, using a depth-first-search. The depth-first search can be implemented with a stack, and for an IfElse instruction, we'll make sure that it gets popped off the stack twice -- first time we evaluate the condition, the second time we evaluate the appropriate expression.

Loops present another more challenging problem. I can think of two use cases for loops: first, to implement an RNN-like model with a variable or very large number of timesteps; second, to implement an optimization algorithm like SGD. The first case requires us to have arrays that grow with the size of the loop. Theano uses Scan, which is a gargantuan piece of code written by Razvan Pascanu and only fully understood by him. It's also not a very convenient user interface. We need to answer a couple questions: (1) what would a convenient python interface look like, which would allow one to specify loops and implement RNNs and simple optimizers, (2) what should the underlying representation be, at the level of the "expression graph" in python and at the level of the execution graph.

@nickjhay and @hojonathanho and I have kicked around a couple ideas about the internal representation. For the python expression graph, we could perhaps use recursive functions in a canonical form, e.g. the primitive recursion formulation here. That said, we may end up falling back to something like Theano's Scan. (But hopefully it can be done more cleanly.)
For the execution graph, @hojonathanho suggested considering the approach taken by lazy functional languages, for example with the Spineless Tagless G-Machine. This problem has also been addressed dataflow computing systems, e.g. Dryad.

Having said all of that, we don't want to re-invent a compiler, and we don't want to build a full-fledged dataflow engine -- we should pick a restricted set of computations that can be performed by a simple system.

Clojure also used the special case approach to solving conditionals (see special forms in the description of evaluation), where for certain functions evaluation of the arguments occurs in an unusual order, rather than implementing thunks. Thunks seem to add unnecessary overhead to this basic operation.

I haven't used Theano much, but it looks like much of the functionality of scan could be implemented by a combination of map and reduce into something like a transducer:

transduce(f, v, []) = (v, [])
transduce(f, v, x:xs) =
    let (v', y) = f(v,x) in
        let (v_final, ys) = transduce(f, v', xs) in
            (v_final, y:ys)

map(f, ys) = transduce(\_,x -> (nil, f(x)), nil, ys)[1]
reduce(f, v, ys) = transduce(\v,x -> (f(v,x), nil), v, ys)[0]

This seems about as hard to implement as map and reduce.