Category-Syntax

(work in progress, not all features documented below are implemented)

Category-Syntax provides a Monad-like syntax for Category and various versions of "Arrow without arr". Once complete, it should looks like this:

rearrange :: Either (Either a1 a2) (Either b1 b2)
          -> Either (Either a1 b1) (Either a2 b2)
rearrange = $(syntax [|do
    ((x1,x2), (y1,y2)) <- getInput
    returnC ((x1,y1), (x2,y2))
  |])

Which should expand to this:

rearrange :: Either (Either a1 a2) (Either b1 b2)
          -> Either (Either a1 b1) (Either a2 b2)
            -- ((x1, x2), (y1,y2)))
rearrange = associate
            -- (x1, (x2, (y1,y2)))
        >>> second coassociate
            -- (x1, ((x2,y1), y2))
        >>> second (first swap)
            -- (x1, ((y1,x2), y2))
        >>> coassociate
            -- ((x1, (y1,x2)), y2)
        >>> first coassociate
            -- (((x1,y1), x2), y2)
        >>> associate
            -- ((x1,y1), (x2,y2))
        >>> returnC  -- aka Category.id
            -- ((x1,y1), (x2,y2))

In the above example, two structural rules have been used: associativity and commutativity. Some substructural languages support more rules, some less.

One important feature of Category-Syntax is that it supports more than one set of rules. For example, if you write code which uses a variable more than once, then the variable will be duplicated via the diag method, which will impose a Contract constraint on the generated code. If you try to use a variable more than once in a context which doesn't support the contraction rule, you will get a type error.

Should I use Category-Syntax?

Most DSLs are monads. Use do-notation if you can.

If your language is not a Monad, then it's probably an Applicative. Use <$> and <*> if you can.

If your language is not a Monad nor an Applicative, then your language is probably a Category. Now we're talking! In general, if your language is implemented by a datatype which is more than a Category but less than an Arrow, then programs in your language will probably be more readable if they use Category-Syntax.

Examples

Bijections

Let's implement the famous "seven trees in one" isomorphism. Dan Piponi's post on the subject is very readable, but since the isomorphism requires a large number of steps, the code he builds is unfortunately quite long and hard to read. With Category-Syntax, we can do much better.

We will need a Category for isomorphisms, and a data structure for trees:

data Bij a b = Bij (a -> b) (b -> a)
inverse :: Bij a b -> Bij b a
instance Category Bij where ...

data Tree a = Leaf a | Branch (Tree a) (Tree a)
tree :: Bij (Tree a) (Either a (Tree a, Tree a))

In his blog post, Dan implements commute, associate, associate', and liftLeft. These correspond to the structural operations swap, associate, coassociate, and first, which we need to implement ourselves because we are using a custom category.

instance PFunctor Bij Either where first = ...
instance QFunctor Bij Either where second =...
instance Bifunctor Bij Either
instance Symmetric Bij Either where swap = ...
instance Associative Bij Either where ...

Now that we have implemented those structural rules, we can let Category-Syntax figure out when they need to be used. This greatly simplifies the code for describing the isomorphism, as we can now focus on the calls to tree and inverse tree instead of having to constantly rearrange the variables like in Dan's version.

For example, here is one especially hairy section from Dan's implementation.

step15' :: T6 :+ T4 :+ T3 :+ T2 :+ T1 :+ T0 -> T7 :+ T5 :+ T4 :+ T3 :+ T2 :+ T1 :+ T0
step15' = liftLeft5 (commute . assemble')

shuffle5' :: T7 :+ T5 :+ T4 :+ T3 :+ T2 :+ T1 :+ T0 -> T7 :+ T5 :+ T3 :+ T2 :+ T1 :+ T0 :+ T4
shuffle5' = swap23 . liftLeft swap23 . liftLeft2 swap23 . liftLeft3 swap23

shuffle4' :: T7 :+ T5 :+ T3 :+ T2 :+ T1 :+ T0 :+ T4 -> T5 :+ T3 :+ T2 :+ T1 :+ T0 :+ T4 :+ T7
shuffle4' = swap23 . liftLeft swap23 . liftLeft2 swap23 . liftLeft3 swap23 . liftLeft4 swap23 . liftLeft5 commute

step14' :: T5 :+ T3 :+ T2 :+ T1 :+ T0 :+ T4 :+ T7 -> T4 :+ T2 :+ T1 :+ T0 :+ T4 :+ T7
step14' = liftLeft5 (assemble . commute)

And here is the equivalent part from our Category-Syntax version.

    (t5', t7) <- tree t6'
    t4' <- inverse tree (t3, t5')

Even with such drastic simplifications, the full seven-trees-in-one isomorphism is still annoyingly long, but at least it's a lot easier to read now.

iso :: Bij T1 T7
iso = $(syntax [|
    -- in this function, primed variables are consumed immediately
    t1' <- getInput
    
    (t0, t2') <- tree t1'
    (t1, t3') <- tree t2'
    (t2, t4') <- tree t3'
    (t3, t5') <- tree t4'
    (t4, t6') <- tree t5'
    (t5', t7) <- tree t6'
    
    -- still in scope: t0,t1,t2,t3,t4,  t7
    
    t4' <- inverse tree (t3, t5')
    t3' <- inverse tree (t2, t4')
    t2' <- inverse tree (t1, t3')
    t1  <- inverse tree (t0, t2')
    
    -- still in scope: t1,  t4,  t7
    
    (t6', t8) <- tree t7
    (t5', t7) <- tree t6'
    (t4', t6) <- tree t5'
    (t3', t5) <- tree t4'
    
    -- still in scope: t1,  t4,t5,t6,t7,t8
    
    t2' <- inverse tree (t1, t3')
    t3' <- inverse tree (t2', t4)
    t4' <- inverse tree (t3', t5)
    t5' <- inverse tree (t4', t6)
    t6' <- inverse tree (t5', t7)
    t7' <- inverse tree (t6', t8)
    
    returnC t7'
  |])

In order for the above to be a valid isomorphism, each variable must be used exactly once. Since Bij does not have a Contract instance, the fact that the above type-checks guarantees that variables match-up correctly.

Wire Diagrams

Another great post from Dan Piponi is about untangling knots. He uses do-notation to represent a knot as a collection of curves and overlapping sections:

example (a,b) = do
  (c,d) <- over (a,b)
  (e,f) <- cap
  (g,h) <- over (c,e)
  (i,j) <- over (f,d)
  (m,n) <- cap
  (k,l) <- cap
  (q,r) <- over (h,k)
  (s,y) <- over (l,i)
  (o,p) <- over (n,g)
  (t,u) <- under (p,q)
  (v,w) <- under (r,s)
  (x,z) <- over (y,j)
  cup (o,t)
  cup (u,v)
  cup (w,x)
  return (m,z)

We give names to the strands coming out of each section, and we use each name exactly once, as one of the two arguments of the next section in which the strand takes part.

With so many variables, it is easy for a typo to cause a variable to be used more than once, but with do-notation, the mistake will not be caught. Let's use Category-Syntax to create a safer knot-description language.

data KnotSection a b where
  Over  :: KnotSection (a,b) (b,a)
  Under :: KnotSection (a,b) (b,a)
  Cap :: KnotSection () (a,a)
  Cup :: KnotSection (a,a) ()

type Knot = Linear KnotSection
over = linear Over
under = linear Under
cap = linear Cap
cup = linear Cup

The datatype KnotSection is not itself a Category, because it only represents the ways in which strands can interact, not the ways in which those interactions can be composed. To support those compositions, we use Linear to lift the base operations of KnotSection into a category with a particular structure. The structure for Linear is a "free symmetric monoidal category", that is, a structure in which variables can be freely paired and rearranged. Without the extra structure of a "cartesian category", which allows variables to be dropped and duplicated, the variables of a symmetric monoidal category must be used exactly once.

We can now describe our knot using almost the same code as before, except this time we can't accidentally drop or reuse a variable:

example' :: Knot (a,b) (m,z)
example' = $(syntax [|do
    (a,b) <- getInput
    (c,d) <- over (a,b)
    (e,f) <- cap ()
    (g,h) <- over (c,e)
    (i,j) <- over (f,d)
    (m,n) <- cap ()
    (k,l) <- cap ()
    (q,r) <- over (h,k)
    (s,y) <- over (l,i)
    (o,p) <- over (n,g)
    (t,u) <- under (p,q)
    (v,w) <- under (r,s)
    (x,z) <- over (y,j)
    () <- cup (o,t)
    () <- cup (u,v)
    () <- cup (w,x)
    returnC (m,z)
  |])

If we make a mistake, we get a type error.

-- No instance for (Contract Knot) arising from a use of ‘diag’
mistake :: Knot (a,b) (m,z)
mistake = $(syntax [|do
    (a,b) <- getInput
    () <- cup (a,b)
    returnC (a,b)
  |])

Reference Counting

When a variable is dropped or reused, Category-Syntax inserts the corresponding structural operation between two user actions. By interpreting those structural operations accordingly, it should be possible to determine when a resource is last used, so it can be released as early as possible.

For simplicity, let's restrict our attention to a single variable. That is, if the variable isn't used at all, we want to free our resource immediately after the variable is bound, and if the variable is used as input to a few user actions, we want to free our resource immediately after the last of those actions.

For concreteness, here is a toy language in which there can only be one variable in scope, because every user action yields ().

refCountExample :: RefCounted Int ()
refCountExample = $(syntax [|do
    x <- getInput
    () <- noop ()
    () <- useVar x
    () <- useVar x
    () <- noop ()
    () <- useVar x
    -- the resource should be released here
    () <- noop ()
    () <- noop ()
    returnC ()
  |])

Let's begin with the easy case: if a variable is never used, a call to fst or snd is inserted immediately after the variable is bound. That's our cue for releasing the resource:

instance Weaken (,) RefCounted where
    fst = RefCounted $ \(x,y) -> do
        releaseResource y
        return x
    snd = RefCounted $ \(x,y) -> do
        releaseResource x
        return y

If a variable is used at least once, then a call to diag is inserted before all uses of the variable except for the last. This call creates a copy of the variable, so that's our cue for increasing its reference count. We assume that the internal representation of RefCounted has methods for incrementing and decrementing an associated counter, and that the counter is already initialized to 1.

instance Contract (,) RefCounted where
    diag = RefCounted $ \x -> do
        increaseCounter x
        return (x,x)

Finally, the last time a variable is used, the variable gets consumed by the user action. There is no special structural operation for variable consumption, so we need every possible user action to decrement the counter and possibly release the resource. In the case of our toy language, there is only one:

useVar :: RefCounted Int ()
useVar = RefCounted $ \x -> do
    r <- reallyUseVar x
    decreaseCounter x
    remaining <- readCounter x
    when (remaining == 0) $ do
      releaseResource x
    return r

Here is our example code again, with all the intermediate calls added by Category-Syntax.

refCountExample :: RefCounted Int ()
                                    -- counter starts at 1
refCountExample = coidl             -- create ()
              >>> first noop
              >>> idl               -- drop ()
              >>> diag              -- increaseCounter, now 2
              >>> first (useVar x)  -- decreaseCounter, now 1
              >>> idl               -- drop ()
              >>> diag              -- increaseCounter, now 2
              >>> first (useVar x)  -- decreaseCounter, now 1
              >>> first noop
              >>> idl               -- drop ()
              >>> useVar x          -- decreaseCounter, now 0
                                    -- releaseResource
              >>> noop
              >>> noop
              >>> returnC

Installation

To install the development version, clone this repository and use cabal install to compile Category-Syntax and its dependencies.

gelisam / category-syntax