Towerism/decaf

			README for Decaf Compiler

PACKAGE INFO
==================

DCC uses the GNU Build System. The GNU Build System automates many tasks
for ensuring source portability and detecting missing dependencies. Autoconf
generates the configure script that checks different features of the system
such as libraries, standard library functions, types, and headers, and the
presence of compilers. Automake implements a syntax that is more clear and
powerful than that of plain Makefiles and is used with Autoconf and the
configure script to adapt the generated Makefiles to the system's configuration.
Libtool is used to create libraries and ensures that the type of library
built (static or shared, whether to use position-independent code etc.) is
compatible with the system. Furthermore, Automake has a useful testsuite feature.
Finally, the ./configure, make, make install sequence of invocations is used 
by many package management systems.

Since I am developing and building DCC on two very different platforms 
(64-bit Mac OS X and Linux), the features of the GNU Build System used for
ensuring source portability is especially useful. 


EXECUTABLES
===================

There are three executables built:

    dcc - the Decaf Compiler
    dcc_i386 - Decaf Compiler using x86 code generation

Executables may be found in the src directory


SIMPLE BUILDING
=================== 

To automatically configure and build, run the "configure" script 
then make, e.g.:

   ./configure
   make

To install the executables (by default in /usr/local/bin, /usr/local/lib, etc),
then run:

     make install
     
Note that if you want to change the directory where dcc is installed to,
run configure setting the prefix option:

    ./configure --prefix=/path/to/installdir
    
If you have issue running gdb on what appears to be the executables before
running make install, it may be because Libtool built a wrapper script
around the actual executables (which would be located in .deps). To use the
actual executables, run make install. As of pp3, this should no longer be
applicable since I am no longer using Libtool.

If you have more than one compiler on your system, you may explicitly
set the CC and CXX variables in the environment before running configure
and make. For example (assuming sh/bash/ksh):

    CC=gcc ./configure
    make

Similarly, using csh:

    setenv CC gcc
    ./configure
    make

To clean:

    make clean

For additional configuration options, see:

    ./configure --help


TESTING
========

To run all the tests:

    make check


ADDITIONAL SCRIPTS
=======================

There are two scripts used to initialize the Autotools scripts and clean the
files generated by the Autotools. These are especially useful since keeping
the generated files around makes the source tree more difficult to manage
especially since I am maintaining a local git repository. To initialize the
GNU Build System for DCC, run:

    bootstrap

To clean all Autotools-generated files, run:

    clean


SEMANTIC ANALYZER 
====================

I chose to use a spaghetti stack to implement the symbol table for the semantic
analyzer. The spaghetti stack is first populated with declaration information
during the first pass. During the second pass, the pointers on the SymTable
nodes used for class hierarchy are used to set up the class hierarchy. During
this pass, a "vtable" of sorts for each ClassDecl is also populated with all the
functions from the interfaces the class implements. During these first two 
passes, most of the checking for declaration conflicts are caught. During the 
third pass, the rest of semantic analysis is performed.

During the recursive unwind on AST, "return types" are set by child nodes and
then used by parent nodes for type checking. Error recovery is performed using
the error type, which is set if errors are found while checking the current
node. However, propagation of errors is suppressed by making the error type
compatible with all other types. This is because I am assuming that as long as
the code fragment containing the source of the error is fixed, the higher up
nodes should be error-free as well. If not, the new error will be caught when
the compiler is run again.  


EXTENSIONS
====================

Progress
------------------

Bit operators         [DONE       ]
Octal ints            [IN PROGRESS]
Closures              [IN PROGRESS]
x86 codegen           [IN PROGRESS]
Linker/loader         [PLANNED    ]
Character literals    [PLANNED    ]
Class protection      [PLANNED    ]

OTHER SOURCE MODIFICATIONS
=============================

I modified ParseCommandLine() to use getopt to facilitate testing of different 
components of the compiler through the -t flag. For example, running dcc with 
the -tparser option will only run the the compiler up to the parser and dump 
the parse tree a la pp2. This is used to facilitate running the test suites
incrementally while I go back and improve code from earlier assignments and
work on the extensions.

IR GENERATION
================

I augmented the semantic checking constructs, in particular the symbol tables,
to aid with IR generation in pp4. During the recursive Emit calls, Location 
objects are added to the Symbols for variables for lookup. I added a simple 
class FrameAllocator that is used to keep track of offsets of variables with 
respect to the fp on the stack or the gp globally or with respect to the base 
of a class. IR Generation for classes is done in two passes. First, like in 
semantic checking, the class hierarchy and inheritance tree is set up using the 
EmitSetup method. During this pass, the offsets for all methods and inherited 
methods and class variables are computed. During the second pass, each method 
is emitted. To take into account parent classes that are declared after a child 
class, the child would call the parent's EmitSetup method recursively upward if 
necessary. A simple check is in place to ensure that the main body of EmitSetup 
is only called once.

Class labels are prefixed using C_, and functions are prefixed using F_. This 
way, class names and function names will not conflict with builtins and 
instructions. One area of improvement is to ensure that variable names will not 
conflict with anything.
Towerism / decaf

About

Languages