Tools for MIL, a Monadic Intermediate Language

The Java code in this repository implements a set of tools for working with MIL programs, including code for generating MIL from LC or MIL source files, optimizing MIL code, and translating MIL to LLVM.

These tools are primarily intended for use as a backend for the alb compiler for Habit. (See https://github.com/habit-lang/alb.) However, they can also be used independently, as suggested by the examples described below.

A paper with more details about MIL is available via https://web.cecs.pdx.edu/~mpj/pubs/mil.html.

A first cut at a test suite for mil-tools is available from https://github.com/zipwith/mil-tester.

Acknowledgements:

Work on mil-tools during the period 2014-2018 has been supported in part by funding from the National Science Foundation, Award No. CNS-1422979.

Requirements:

The following items are required to use the code in this repository:

Java Development Kit (version >= 1.8 should be sufficient)
Apache Ant (version >= 1.9.2 should be sufficient)
The LLVM Compiler Infrastructure (versions >= 4.0 should be sufficient)

Installation:

First, we need to build the mil-tools, and the simplest way to do this (assuming you already have a suitable JDK and copy of Apache Ant installed) uses the following command sequence:

# Ensure a clean starting point:
ant clean

# Build mil-tools:
ant

[Use of ant clean is recommended after every update to ensure a consistent build.]

Next, copy the milc shell script (or the milc.bat file on Windows) into a suitable folder on your path, and then edit that file so that it includes a full path to the location of the mil-tools.jar file on your system. After these steps, you should be able to run milc (without arguments) in any directory to produce output that starts as follows (run the command yourself to see the full list of milc options):

$ milc
usage: milc [inputs|options]
inputs:  filename.milc  Load options from specified file
         filename.mil   Load MIL source from specified file (.lmil for literate)
         filename.lc    Load LC source from specified file (.llc for literate)
options: -v             verbose on
         -d             display debug messages
         -ipathlist     append items to input search path
...
$

Using mil-tools: a small example

The process for compiling an LC source file (in this case, demo/fib.lc) to MIL and then generating LLVM is illustrated by the following commands (this time assuming that you have a suitable installation of LLVM):

# Use mil-tools to compile fib.lc into LLVM:
milc -ilib --standalone demo/fib.lc -ltmp/fib.ll

# Build an executable from the generated LLVM code:
clang -o tmp/fib tmp/fib.ll demo/runtime.c

A brief explanation of the milc command line options used here:

-ilib sets the path for input files to include the lib directory, which is where the default prelude.lc and other library files are stored.
--standalone configures milc to generate a program with an entrypoint called main.
demo/fib.lc specifies the name of an LC source program.
-ltmp/fib.ll indicates that milc should generate an LLVM version of the program and write a copy of the code in the file tmp/fib.ll.

And a similarly brief explanation of the clang command line options:

-o tmp/fib sets the name (fib) and folder location (tmp) of the executable program that clang produces.
tmp/fib.ll specifies the name of the input LLVM code; of course, this is the name of the file that was generated by the preceeding milc command.
demo/runtime.c is the name of a small C source file that provides an implementation of the printWord function that is used in the original fib.lc program. Some additional examples that combine LC and C code in a single program are described later in this document.

Depending on details of your LLVM installation/platform (for example, on macOS), you may need to substitute gcc for clang in the second command here. You may also need to substitute java -jar mil-tools.jar in place of milc, depending on your platform (Windows users can substitute milc.bat).

Other methods for compiling the generated LLVM code may be useful in other settings. For example, you can use the following two commands to compile fib.ll via assembly with a specific target architecture (in this case, 32-bit x86):

# Compile the generated LLVM code:
llc -O2 -filetype=asm -march=x86 tmp/fib.ll

# Build an executable that includes the generated code:
gcc -m32 -o tmp/fib -Wl,-no_pie demo/runtime.c tmp/fib.s

You may see a message of the form warning: overriding the module target triple with ... when you run the commands above; we will need to improve the LLVM code generator to avoid this in a future release, but you should be able to ignore this message for now.

Finally, you can run the compiled program:

# Run the generated executable:
tmp/fib

The result should look something like the following, with a pair of arbitrarily chosen numbers bracketing two 144s, which have been calculated using two slightly different definitions of the Fibonnacci function:

Alternatively, you can use the following command line to run the program directly through the bytecode interpreter:

milc -ilib --standalone -x demo/fib.lc

Replace -x with -m to see the generated MIL code, or with -l to see the generated LLVM code, etc. (Or just run either milc, milc.bat or java -jar mil-tools.jar for a summary of additional command line options.)

A larger example: The priority set case study

As a slightly larger and more complex example of the pattern described above, the demo folder also includes a pair of source files based on the Priority Set case study that appears in the Habit Language Report:

prioset.lc is an LC implementation of a priority set (maximum heap) data structure, based on the code in the Habit report. (Some changes were made to include appropriate implementations for the external primitives that are needed in this program for basic operations such as arithmetic and reading/writing from memory. If we were starting instead from a Habit source program, these details would be generated automatically by the Habit compiler.)
priodemo.c is a simple C demonstration program that uses the functions from prioset.lc as a simple library.

The following commands can be used to compile and run this demonstration program (on a machine running macOS High Sierra; some changes will be required for other platforms):

# Compile the prioset.lc implementation:
milc -ilib demo/prioset.lc --llvm-main=initPrioset -ltmp/prioset.ll --target=x86_64-apple-macosx10.13.0 --64

# Compile the priodemo.c file and link with the code from prioset.lc:
clang -DWORD64 -o tmp/priodemo demo/priodemo.c tmp/prioset.ll

# Run the resulting executable:
tmp/priodemo

The following notes explain some of the command line options used here that were not already documented for the previous example:

--llvm-main=initPrioset specifies a name for the initialization function that must be used to initialize the priority set data structure before using the other functions defined in prioset.lc. The name initPrioset is used in priodemo.c, so it is essential that the same name is specified here. (Further details about the --llvm-main=... command line option are presented later in this document.)
--target=x86_64-apple-macosx10.13.0 specifies the "target triple" string that is included in the generated LLVM code to specify details about the target platform; clearly, it will be necessary to modify this part of the command line if you are using a different platform. Note that you can find the target triple that was used to configure LLVM by running the command llvm-config --host-target, but a single LLVM installation can typically accomodate multiple target types.
--64 specifies that the generated LLVM code will generate code assuming that the Word type in LC is 64 bits wide. The -DWORD64 option in the clang command line is used to ensure that the code in priodemo.c is configured in a corresponding way (in particular, by using the C long type as a representation for Word values). Alternatively, it should also be possible to generate a 32 bit version of the program by using: the --32 option for milc; a -DWORD32 option for clang; and a modified version of the --target=... option, such as replacing x86_64 with i386. IMPORTANT: It is essential that the same word size is used consistently for all LC and C code fragments that are used together in a single program. Failure to satisfy this requirement will likely result in segmentation, protection, or other faults or errors when attempting to generate and run the priodemo example.

The output that is produced by the final command in this sequence is as follows:

$ tmp/priodemo 
Priority set demo
Inserting some numbers:
Inserting 12
Inserting 5
Inserting 7
Inserting 128
Inserting 67
Removing three numbers:
Removing 128
Removing 67
Removing 12
Adding some more numbers:
Inserting 3
Inserting 32
Inserting 10
Draining the queue:
Removing 32
Removing 10
Removing 7
Removing 5
Removing 3
Done!
$

A tour through some of the mil-tools passes

mil-tools uses a sequence of compilation "passes" to process and transform input programs. The specific sequence of passes that is used for a given task depends on what kind of output is required: if a user requests LLVM output using the -l flag, for example, then mil-tools uses extra passes that would not be included if the user had just requested MIL output using the -m command line option.

For debugging purposes, however, and to help advanced users to understand how the compilation process works, it is possible to override the default and specify a custom sequence of parses using the -p command line option. We will use this in the following sequence of examples to illustrate some of the key passes that are provided by mil-tools. The sample program that we use in these demonstrations is in a source file called demo/ex.lc whose contents are as follows:

require "prelude.lc"
data Dup a      = Dup a a
entrypoint swap7 :: Dup (Bit 7) -> Dup (Bit 7)
entrypoint swap8 :: Dup (Bit 8) -> Dup (Bit 8)
swap7 = swap
swap8 = swap

swap d = case d of Dup x y -> Dup y x

entrypoint d1
d1 = Dup B001 B110

If you just want to have mil-tools load, check, and compile this program to MIL, you can use the command line option -p, as in milc -ilib -m demo/ex.lc -p. This, however, will produce quite a lot of output, and we can simplify the code a little by running it through the MIL optimizer. This can be accomplished by adding the character o to the end of the -p command line option, producing the following output (more generally, the -p option takes a sequence of letters, each of which specifies a particular pass; mil-tools will then attempt to run exactly that sequence of passes on the MIL code that is loaded in directly or generated by compiling LC source files):

$ milc -ilib -m demo/ex.lc -po
data -> (a :: *) (b :: *)
  = Func ([a] ->> [b])

data Dup (a :: *)
  = Dup a a

-----------------------------------------
-- not recursive
entrypoint d1 :: Dup (Bit 3)
d1 <-
  Dup(B001, B110)

-----------------------------------------
-- not recursive
b84 :: forall (a :: *). [Dup a] >>= [Dup a]
b84[t0] =
  assert t0 Dup
  t1 <- Dup 0 t0
  t2 <- Dup 1 t0
  Dup(t2, t1)

-----------------------------------------
-- not recursive
b0 :: forall (a :: tuple). [] >>= a
b0[] =
  halt(())

-----------------------------------------
-- not recursive
b81 :: forall (a :: *). [Dup a] >>= [Dup a]
b81[t0] =
  case t0 of
    Dup -> b84[t0]
    _ -> b0[]

-----------------------------------------
-- not recursive
k54 :: forall (a :: *). {} [Dup a] ->> [Dup a]
k54{} t0 = b81[t0]

-----------------------------------------
-- not recursive
s2 :: forall (a :: *). [Dup a] ->> [Dup a]
s2 <-
  k54{}

-----------------------------------------
-- not recursive
swap :: forall (a :: *). Dup a -> Dup a
swap <-
  Func(s2)

-----------------------------------------
-- not recursive
entrypoint swap8 :: Dup (Bit 8) -> Dup (Bit 8)
swap8 <-
  return swap

-----------------------------------------
-- not recursive
entrypoint swap7 :: Dup (Bit 7) -> Dup (Bit 7)
swap7 <-
  return swap

-----------------------------------------
-- Entrypoints: d1 swap8 swap7
$

One of the details to note here is the fact that function values like swap are wrapped up in uses of the constructor function Func, which is introduced as the result of a (builtin) data type definition data d -> r = Func ([d] ->> [r]). You might also notice that the case construct in block b81 includes a default branch (introduced with an underscore instead of a constructor name), even though the Dup type that it is matching on has only one constructor. We can eliminate these overheads by including the c pass ("constructor function rewrite") as part of the -p setting:

$ milc -ilib -m demo/ex.lc -pco
data Dup (a :: *)
  = Dup a a

-----------------------------------------
-- not recursive
entrypoint d1 :: Dup (Bit 3)
d1 <-
  Dup(B001, B110)

-----------------------------------------
-- not recursive
b80 :: forall (a :: *). [Dup a] >>= [Dup a]
b80[t0] =
  t1 <- Dup 0 t0
  t2 <- Dup 1 t0
  Dup(t2, t1)

-----------------------------------------
-- not recursive
k54 :: forall (a :: *). {} [Dup a] ->> [Dup a]
k54{} t0 = b80[t0]

-----------------------------------------
-- not recursive
swap :: forall (a :: *). [Dup a] ->> [Dup a]
swap <-
  k54{}

-----------------------------------------
-- not recursive
entrypoint swap8 :: [Dup (Bit 8)] ->> [Dup (Bit 8)]
swap8 <-
  return swap

-----------------------------------------
-- not recursive
entrypoint swap7 :: [Dup (Bit 7)] ->> [Dup (Bit 7)]
swap7 <-
  return swap

-----------------------------------------
-- Entrypoints: d1 swap8 swap7
$

After this transformation, all references to the LC level function type constructor -> have been replaced by uses of the MIL function type arrow ->>. More generally, if a source program contains a definition of a non recursive data type data T a = MkT t, then the c pass will eliminate all uses of the T a type (and all uses of the MkT constructor) by replacing them with a corresponding instance of t.

As a next step, we can eliminate polymorphism and parameterized data types from a MIL program by running the specializer, represented by the pass letter s: (We follow that here with a second optimizer run, just to clean up the resulting code, but that is not technically required. Alternatively, we could achieve a similar effect to what is shown here by using -pcso, which only uses one optimizer pass.)

$ milc -ilib -m demo/ex.lc -pcoso
data Dup0
  = Dup0 (Bit 7) (Bit 7)

data Dup1
  = Dup1 (Bit 8) (Bit 8)

data Dup2
  = Dup2 (Bit 3) (Bit 3)

-----------------------------------------
-- not recursive
b80 :: [Dup0] >>= [Dup0]
b80[t0] =
  t1 <- Dup0 0 t0
  t2 <- Dup0 1 t0
  Dup0(t2, t1)

-----------------------------------------
-- not recursive
k54 :: {} [Dup0] ->> [Dup0]
k54{} t0 = b80[t0]

-----------------------------------------
-- not recursive
swap :: [Dup0] ->> [Dup0]
swap <-
  k54{}

-----------------------------------------
-- not recursive
entrypoint swap7 :: [Dup0] ->> [Dup0]
swap7 <-
  return swap

-----------------------------------------
-- not recursive
b801 :: [Dup1] >>= [Dup1]
b801[t0] =
  t1 <- Dup1 0 t0
  t2 <- Dup1 1 t0
  Dup1(t2, t1)

-----------------------------------------
-- not recursive
k541 :: {} [Dup1] ->> [Dup1]
k541{} t0 = b801[t0]

-----------------------------------------
-- not recursive
s1 :: [Dup1] ->> [Dup1]
s1 <-
  k541{}

-----------------------------------------
-- not recursive
entrypoint swap8 :: [Dup1] ->> [Dup1]
swap8 <-
  return s1

-----------------------------------------
-- not recursive
entrypoint d1 :: Dup2
d1 <-
  Dup2(B001, B110)

-----------------------------------------
-- Entrypoints: swap7 swap8 d1
$

The generated program is longer now because there are essentially two different copies of the swap function, each operating on a different version of the Dup type. As intended, however, the program no longer contains any definitions with polymorphic types.

In order to generate working code for this program, we need to provide representations for types like Bit 7 and Bit 8 that show up in this program. In mil-tools, both of these types are actually represented as values of type Word, treating only the lower 7 or 8 bits as being significant, in each case. The r character can be used to specify a "representation transformation" pass that rewrites input programs to reveal these low level details:

$ milc -ilib -m demo/ex.lc -pcosoro
data Dup0
  = Dup0 Word Word

-----------------------------------------
-- not recursive
entrypoint swap7 :: [Dup0] >>= [Dup0]
swap7[t0] =
  t1 <- Dup0 0 t0
  t2 <- Dup0 1 t0
  Dup0(t2, t1)

-----------------------------------------
-- not recursive
entrypoint swap8 :: [Dup0] >>= [Dup0]
swap8[t0] =
  t1 <- Dup0 0 t0
  t2 <- Dup0 1 t0
  Dup0(t2, t1)

-----------------------------------------
-- not recursive
entrypoint d1 :: Dup0
d1 <-
  Dup0(1, 6)

-----------------------------------------
-- Entrypoints: swap7 swap8 d1
$

For types like Dup0, however, it is possible to provide more compact representations by replacing each of the associated data type definitions with corresponding bitdata definitions. This can be accomplished by using the letter b to specify a bitdata rewrite pass. Note that b can only be used immediately after s because it relies on information that is generated during specialization:

$ milc -ilib -m demo/ex.lc -pcosbo
bitdata Dup0 /14
  = Dup0 [ Dup00 :: Bit 7 | Dup01 :: Bit 7 ]
    -- predDup0(x :: Bit 14) = true

-- bit pattern:
--  ______________

bitdata Dup1 /16
  = Dup1 [ Dup10 :: Bit 8 | Dup11 :: Bit 8 ]
    -- predDup1(x :: Bit 16) = true

-- bit pattern:
--  ________________

bitdata Dup2 /6
  = Dup2 [ Dup20 :: Bit 3 | Dup21 :: Bit 3 ]
    -- predDup2(x :: Bit 6) = true

-- bit pattern:
--  ______

-----------------------------------------
-- not recursive
b80 :: [Dup0] >>= [Dup0]
b80[t0] =
  t1 <- Dup0 0 t0
  t2 <- Dup0.Dup0 0 t1
  t3 <- Dup0.Dup0 1 t1
  t4 <- Dup0.Dup0(t3, t2)
  Dup0(t4)

-----------------------------------------
-- not recursive
k54 :: {} [Dup0] ->> [Dup0]
k54{} t0 = b80[t0]

-----------------------------------------
-- not recursive
swap :: [Dup0] ->> [Dup0]
swap <-
  k54{}

-----------------------------------------
-- not recursive
entrypoint swap7 :: [Dup0] ->> [Dup0]
swap7 <-
  return swap

-----------------------------------------
-- not recursive
b801 :: [Dup1] >>= [Dup1]
b801[t0] =
  t1 <- Dup1 0 t0
  t2 <- Dup1.Dup1 0 t1
  t3 <- Dup1.Dup1 1 t1
  t4 <- Dup1.Dup1(t3, t2)
  Dup1(t4)

-----------------------------------------
-- not recursive
k541 :: {} [Dup1] ->> [Dup1]
k541{} t0 = b801[t0]

-----------------------------------------
-- not recursive
s1 :: [Dup1] ->> [Dup1]
s1 <-
  k541{}

-----------------------------------------
-- not recursive
entrypoint swap8 :: [Dup1] ->> [Dup1]
swap8 <-
  return s1

-----------------------------------------
-- not recursive
s2 :: Dup2.Dup2
s2 <-
  Dup2.Dup2(B001, B110)

-----------------------------------------
-- not recursive
entrypoint d1 :: Dup2
d1 <-
  Dup2(s2)

-----------------------------------------
-- Entrypoints: swap7 swap8 d1
$

If you're wondering why this version of the program looks a little bit longer, that's because bitdata types use two levels of constructor functions: for every constructor function C in a bitdata type T, there is also an associated layout type called T.C, each of which also has a constructor function with the same name.

At this point, we can add r back to our list of passes to run the representation transformation. At this point, mil-tools determines that complete Dup0 and Dup1 values can be represented within a single word (only 16 and 14 bits, respectively, are needed in each case0) and replaces the algebraic data type selectors and constructors with some appropriate bit twiddling logic, as illustrated in the final output shown below:

$ milc -ilib -m demo/ex.lc -pcosboro
-----------------------------------------
-- not recursive
entrypoint swap7 :: [Word] >>= [Word]
swap7[t0] =
  t1 <- lshr((t0, 7))
  t2 <- shl((t0, 7))
  t3 <- and((t2, 16256))
  or((t1, t3))

-----------------------------------------
-- not recursive
entrypoint swap8 :: [Word] >>= [Word]
swap8[t0] =
  t1 <- lshr((t0, 8))
  t2 <- shl((t0, 8))
  t3 <- and((t2, 65280))
  or((t1, t3))

-----------------------------------------
-- not recursive
entrypoint d1 :: Word
d1 <-
  return 14

-----------------------------------------
-- Entrypoints: swap7 swap8 d1
$

Note also that, as part of the r representation transformation, the definitions of swap8 and swap7 have been changed from definitions of function closures (using the ->> arrow) to simple block definitions, which are more easily accessed as standard functions from other LLVM languages, as can be seen if we switch from MIL to LLVM output in the previous example: (It is possible to generate both with a single mil-tools invocation, but we only generate one at a time here so that the output is a little easier to follow.)

$ milc -ilib -l demo/ex.lc -pcosboro
@d1 = global i32 14

define i32 @swap7(i32 %r0) {
entry:
  br label %swap7

swap7:
  %r2 = lshr i32 %r0, 7
  %r4 = shl i32 %r0, 7
  %r3 = and i32 %r4, 16256
  %r1 = or i32 %r2, %r3
  ret i32 %r1
}

define i32 @swap8(i32 %r0) {
entry:
  br label %swap8

swap8:
  %r2 = lshr i32 %r0, 8
  %r4 = shl i32 %r0, 8
  %r3 = and i32 %r4, 65280
  %r1 = or i32 %r2, %r3
  ret i32 %r1
}
$

Function libraries, standalone programs, and variations

The tools described here can be used in various ways to support the development of software libraries, standalone programs, or variations in between. The following examples illustrate some of the possibilities, and the corresponding use of LC/MIL features and milc command line options.

Function Libraries

Our first example is a small library of functions that are not complete programs in themselves, but could potentially be linked in as useful components of a variety of different applications. This particular library, whose source code is in demo/funlib.lc provides two implementations of the Fibonnaci function, and two implementations of the factorial function, so it will probably not be useful in many real world settings, but should provide some familiar examples and should also be sufficient to demonstrate the main ideas:

$ cat demo/funlib.lc 
require "prelude.lc"

-- A standard recursive version of the Fibonnaci function:
entrypoint fib :: Word -> Word
fib n = if eq n 0 then 0
        else if eq n 1 then 1
        else add (fib (sub n 1)) (fib (sub n 2))

-- An iterative version of the Fibonnaci function:
entrypoint itfib :: Word -> Word
itfib = let loop a b n = if eq n 0 then a else loop b (add a b) (sub n 1)
        in loop 0 1

-- A traditional recursive implementation of the factorial function:
entrypoint recfac :: Word -> Word
recfac n = if eq n 0 then 1 else mul n (recfac (sub n 1))

-- An iterative version of the factorial function:
entrypoint itfac :: Word -> Word
itfac = let loop a n = if eq n 0 then a else loop (mul a n) (sub n 1)
        in loop 1
$

Perhaps the most important detail here is that each of the four functions is declared as an entrypoint. (The type signatures that are provided for each function here could be useful as documentation, but they can also be inferred automatically, so are not actually required.) As a result, when this program is compiled using the following commands, it will produce an LLVM output (in the file tmp/funlib.ll) that contains an externally visible function definition for each of the four LC functions.

$ milc -ilib demo/funlib.lc -ltmp/funlib.ll
$ clang -c -o tmp/funlib.o tmp/funlib.ll
warning: overriding the module target triple with ...
1 warning generated.
$

[As before, assuming you have a conventional LLVM installation, you should be able ignore the warning message shown here.]

The command sequence shown above follows the use of milc with a clang command that builds an object file tmp/funlib.o that can be linked with other programs. Alternatively, we can write a C program that uses the functions in funlib.lc and then use a single clang command to compile and link it with the tmp/funlib.ll source file:

$ cat demo/funlibtest.c 
#include <stdio.h>

extern int fib(int);
extern int itfib(int);
extern int recfac(int);
extern int itfac(int);

int main(int argc, char** argv) {
  for (int i=0; i<10; i++) {
    printf("%d\t%d\t%d\t%d\t%d\n",
           i, fib(i), itfib(i), recfac(i), itfac(i));
  }
}
$ clang -o tmp/funlibtest demo/funlibtest.c tmp/funlib.ll 
$

[Note: On macOS, you may be able (or need) to substitute gcc for clang in the command lines shown here, depending on details of your LLVM installation.]

Notice that the C code references the functions we have seen defined in funlib.lc as regular C functions with conventional function prototypes. This is possible because, if an entrypoint in the LC code is a curried function, then it will be exported from the generated LLVM code as a fully uncurried function (i.e., as a function that takes all of its arguments at the same time, as is the norm in languages like C that do not directly support higher order functions). The result is an executable that blends code from the two different source code components into a single program:

$ tmp/funlibtest 
0       0       0       1       1
1       1       1       1       1
2       1       1       2       2
3       2       2       6       6
4       3       3       24      24
5       5       5       120     120
6       8       8       720     720
7       13      13      5040    5040
8       21      21      40320   40320
9       34      34      362880  362880
$

Initialization functions

Although there is no need for this in our funlib example, it is sometimes necessary to perform some kind of initialization steps before some (or all) of the entrypoints in a library are used for the first time. The following shows a contrived example that includes entrypoints called fib12 and fib15 that are supposed to contain the 12th and 15th Fibonnaci numbers, respectively:

$ cat demo/needinit.lc 
require "prelude.lc"

entrypoint fib :: Word -> Word
fib n = if eq n 0 then 0
        else if eq n 1 then 1
        else add (fib (sub n 1)) (fib (sub n 2))

entrypoint fib12, fib15
fib12  = fib 12
fib15  = fib 15

$

But now suppose that we link this with a C program that reads the values of fib12 and fib15 and displays them on the screen. At what point will those values be calculated? If we compile this program using a similar milc command line to the previous example, we get an error informing us that the generated code requires an initialization function:

$ milc -ilib demo/needinit.lc -ltmp/needinit.ll
ERROR: LLVM program requires initialization function (set using --llvm-main=NAME)

$

In this particular example, an initialization function is needed to calculate values for fib12 and fib15, and to save those values in the corresponding variables. More generally, an initialization function is needed for any program that requires one time initialization of variables or memory areas before other parts of the program are used.

The error message also suggests that we can fix this problem by using an --llvm-main command line option, which is precisely what we do in the following, choosing to use the name initialize for this function:

$ milc -ilib demo/needinit.lc -ltmp/needinit.ll --llvm-main=initialize
$

Now we can write C programs that use the needinit library. And, so long as we ensure that the initialize() function is called before the values of fib12 and fib15 are accessed, then those variables will be properly initialized and everything should work as expected:

$ cat demo/printfibs.c 
#include <stdio.h>

extern void initialize();
extern int fib12, fib15;

int main(int argc, char** argv) {
  initialize();
  printf("fib(12)=%d, fib(15)=%d\n", fib12, fib15);
}

$ clang -o tmp/printfibs demo/printfibs.c tmp/needinit.ll
$ tmp/printfibs 
fib(12)=144, fib(15)=610
$

In practice, it may be difficult to determine whether or not an initialization function is actually required, short of running a milc command line like the one above to see if an error is reported. For example, if we replace the fib function in needinit.lc with the more efficient itfib function, then the mil-tools optimizer in actually able to compute the values of fib12 and fib15 at compile-time, and there will not be any need for an initialization function. For this reason, it is often a good practice to specify an initialization function name using the --llvm-main option and to encourage users of the library to call that function before attempting to accessing any of its other features. If the library does not actually require special treatment, then milc will just generate an initialization function with an empty body. But if something subsequently changes in the program that does require proper initialization, then users of the library, at least if they are following your advice, will already have included the appropriate function call in their code.

Writing standalone programs

In the previous examples, we wrote code in LC that was accessed and executed from a main function written in C. But what if we wanted to write our main program directly in LC, as in the following simple example for displaying the value of fib 12:

$ cat demo/program.lc
require "prelude.lc"
require "io.mil"

itfib :: Word -> Word
itfib  = let loop a b n = if eq n 0 then a else loop b (add a b) (sub n 1)
         in loop 0 1

-- A program to print the 12th Fibonnaci number:
export main
main = printWord (itfib 12)
$

This program does not specify any explicit entrypoints, but it does contain a definition for a main routine that is marked as an export. This allows us to compile the program using the --standalone flag for milc:

$ milc -ilib --standalone demo/program.lc -ltmp/program.ll
$ clang -o tmp/program tmp/program.ll demo/runtime.c
warning: overriding the module target triple with ...
1 warning generated.
$ tmp/program
144
$

Note that, despite what the name --standalone might suggest, the clang command line shown here does actually mention the C source file demo/runtime.c, which provides an implementation for the printWord function in the original LC code. So, in fact, we are still mixing LC and C code, but this time we are calling the C code from the LC code, which is the opposite to what we saw in the earlier examples.

The --standalone flag is really just a shorthand for the following two command line options:

--llvm-main=main --mil-main=main

As we have already seen, the first of these specifies that the generated LLVM code should include an initialization function called main. The second identifies a specific definition, also with the name main, but this time in the input program, that will be executed at the end of the initialization function. At most one main definition can be specified in this way, but it is possible to use a value with a different name, so long as it is marked as an export in the source program and identified using an explicit --mil-main=NAME setting on the command line.) While --standalone provides sensible defaults for both of these command line options, the separate --mil-main and --llvm-main options provide useful additional control in some more advanced settings.

habit-lang / mil-tools