omasanori/cps-generator

* Continuation-Passing Style Code Generator             -*- outline -*-

This directory contains a general, order-independent, redex-free
continuation-passing style code generator, available for copying under
the terms of a three-clause BSD-style licence.

cps.scm                 General code generator and notes on usage.
cps-sexp-target.scm     Example S-expression output format.
sexp-to-cps.scm         Example use of the code generator for an
                          S-expression input format, using
                          <http://synthcode.com/scheme/match.scm>
                          to match input patterns.

The terms in the above summary have the following specific meanings:

** Code Generator

This is a set of procedures for generating code in a
continuation-passing style.

** Continuation-Passing Style

The input language is some recursive expression structure with every
continuation implicit; the output language is rigidly structured with
alternating control and data nodes, such as combinations or
conditionals (control) and variable references or abstractions (data),
and with every continuation explicit.  The rigid, more uniform
structure of programs can make code to analyze them easier to write and
to understand, because it has fewer cases; explicating entities that
were formerly implicit can make the compiler's data structures clearer.

** Redex-Free

A redex is a REDucible EXpression.  For example, a naive CPS
transformation of

  (lambda (a)
    (f (+ 1 a)))

is

  (lambda (c a)
    ((lambda (c0) (c0 1))
     (lambda (x0)
       ((lambda (c1) (c1 a))
        (lambda (x1)
          (+ (lambda (t) (f c t)) x0 x1))))))

The terms ((lambda (c0) (c0 1)) (lambda (x0) ...)) and ((lambda (c1)
(c1 a)) (lambda (x) ...)) are reducible expressions, because we
trivially have enough syntactic information to yield

  (lambda (c a)
    (+ (lambda (t) (f c t)) 1 a))

instead.  This code generator does not generate any redexes (or
redices?) that were not present in the input.

** Order-Independent

This code generator does not impose any particular order on the
evaluation of subexpressions of combinations.  For example, a
(redex-free) CPS transformation that orders evaluation of
subexpressions left-to-right would transform

  (lambda (x)
    (f (g x) (h x)))

into

  (lambda (c x)
    (g (lambda (t0) (h (lambda (t1) (f t0 t1)) x)) x)),

and thereby lose valuable information from the source program about
what sets of expressions may be evaluated in any order.  Instead, this
code generator generates

  (let-values (((t0) (lambda (c0) (g c0 x)))
               ((t1) (lambda (c1) (h c1 x))))
    (f t0 t1)),

where LET-VALUES has a slightly unusual meaning here: the right-hand
expression of each clause is not an expression of a value for the
clause's variable, but an expression for a procedure that takes a
continuation and passes to that continuation a value for the clause's
variable.  (There is a similar construction for LETREC-VALUES.)

If you want to order the subproblems, you can do that by changing the
code to use the commented CPS/GENERATE-SUBPROBLEMS-IN-ORDER procedure
rather than CPS/GENERATE-SUBPROBLEMS.

** General

The code generator is independent of the formats of the input and
output languages.  For example, they might both be some restricted form
of S-expressions, or they might be directed control flow graphs.

* Shortcomings

This approach fails to transmit information about the reification of
continuations between branches of control flow; it thus requires a
subsequent global analysis to transmit this information, or else a
naive compiler would cons needless closures (possibly on the stack) for
certain join point continuations.  E.g., for

   (lambda (x y z)
     (f (or x (frob y)) z)),

this code generator will generate

   (lambda (c x y z)
     (let ((j (lambda (t) (f c t z))))
       (if (true? x)
           (j x)
           (frob j y)))).

It would be better to generate

   (lambda (c x y z)
     (let ((j (lambda (t) (f c t z))))
       (if (true? x)
           (j x)
           (frob (lambda (t) (j t)) y)))),

so that a naive compiler could defer consing a closure to the
alternative branch of the conditional, because J is used only in an
operator position and thus serves only for control, not for data.  In a
control flow graph representation of the program, such as that used by
LIAR, J would not exist at all as a variable; instead the node for the
truth test of X would have an edge directly to the combination (F T Z),
while the alternative branch would have a continuation with an edge
into that combination.

* Copying

Copyright (c) 2009, Taylor R. Campbell.

Verbatim copying and distribution of this entire article are permitted
worldwide, without royalty, in any medium, provided this notice, and
the copyright notice, are preserved.
omasanori / cps-generator

About

Languages