Thoughts about supercompilation and optimal reduction

Question

Thoughts about supercompilation and optimal reduction

Hirrolot opened this issue 6 months ago · comments

I've been interested in these two topics lately. At first sight, they seem to be very different (and they are!), but I've started coming to a conjecture that they both can be complementary.

Supercompilation is a program transformation technique that is usually run at compile-time ¹. It builds a graph (a (partial) process tree) approximating all possible execution states for a particular program, optimizes this graph by removing computational redundancy, and outputs a residual program that has a simpler and (hopefully) more efficient structure. Notwithstanding that supercompilation mimics run-time execution, it can be made terminating even for a Turing-complete language using folding and generalization of terms.

Optimal reduction is a technique to reduce lambda terms to their beta normal form. It does so by building a graph for an initial term (an interaction net), then transforming the graph with a fixed set of local rewrite rules. When the job is done, it outputs a residual program representing a beta normal form of the initial term, which has a simpler structure. Due to the halting problem, optimal reduction is not guaranteed to terminate on every input term.

A supercompiler can be understood as a program that inlines every single thing. An optimal evaluator can be understood as a program that shares every single thing. So, in principle, these two techniques are diverging in this matter: while a supercompiler attempts to remove sharing, an optimal evaluator does the opposite.

One of the most dangerous problems with supercompilation is that, due to a disrespectful attitude to sharing, it can output a term that is much larger than the input term. On the other hand, this is absolutely not a problem with optimal reduction, because a very compact interaction net can be residualized into a very large term. Therefore, one potential approach to solving the supercompilation code bloat problem would be to escape the syntactical universe of input terms and instead manipulate them in a somewhat more intricate manner that permits an all-encompassing notion of sharing. In a sense, Lamping's optimal reduction already provides such a mechanism.

It would be pretty interesting to see how optimal reduction's techniques can be applied to supercompilation. Because of sharing, a residual program can be made "as small as possible" using some sort of a graph representation. When it is given run-time data to act upon, it can be evaluated by some sort of a graph machine.

Anyhow, if you find the discussion interesting, please share (punch unintended!) any thoughts about it.

By the way, the README section "Is HVM's optimality only relevant for weird, academic λ-encoded terms?" demonstrates a truly interesting example of annihilating an intermediate list for the map f . map g term. In "Rethinking Supercompilation", Neil Mitchell demonstrates the same term encoded as follows (section 2):

root g f x = map g (map f x)

map f [] = []
map f (x : xs) = f x : map f xs

Which the suggested supercompiler is able to optimize into a single-pass function (section 2.3.2):

root g f x = case x of
    [] -> []
    z : zs -> g (f z) : root g f zs

Which is computationally identical to HVM's produced result map (f . g).

I believe that even a simple supercompiler would be also able to optimize your next example foldr (.) id funcs. So, in a sense, HVM and supercompilers have a common set of optimizations they're able to perform.

In theory, it can be executed at run-time, but I haven't seen a single experiment. ↩

Thoughts about supercompilation and optimal reduction

Footnotes